LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/5] Improve visibility of writeback
@ 2024-04-23  3:46 Kemeng Shi
  2024-04-23  3:46 ` [PATCH v5 1/5] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23  3:46 UTC (permalink / raw)
  To: akpm, willy, jack, bfoster, tj
  Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel

v4->v5:
-Add new patch to fix build problem of "writeback: support retrieving
per group debug writeback stats of bdi". The new fix patch can be folded
into it's pre-patch if the pre-patch is not merge into tree or it could
be pick-up to apply.

v3->v4:
-Fix build warning that filepages, headroom and writeback in
cgwb_calc_thresh is used uninitialized when CONFIG_CGROUP_WRITEBACK
is not enabled.

v2->v3:
-Drop patches to protect non-exist race and to define GDTC_INIT_NO_WB to
null.
-Add wb_tryget to wb from which we collect stats to bdi stats.
-Create wb_stats when CONFIG_CGROUP_WRITEBACK is not enabled.
-Add a blank line between two wb stats in wb_stats.

v1->v2:
-Send cleanup to wq_monitor.py separately.
-Add patch to avoid use after free of bdi.
-Rename wb_calc_cg_thresh to cgwb_calc_thresh as Tejun suggested.
-Use rcu walk to avoid use after free.
-Add debug output to each related patches.

This series tries to improve visilibity of writeback. Patch 1 make
/sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
instead of only writeback info in root cgroup. Patch 2 add a new
debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
info. Patch 3 add wb_monitor.py to monitor basic writeback info
of running system, more info could be added on demand. Patch 4
is a random cleanup. More details can be found in respective
patches. Thanks!

Following domain hierarchy is tested:
                global domain (320G)
                /                 \
        cgroup domain1(10G)     cgroup domain2(10G)
                |                 |
bdi            wb1               wb2

/* all writeback info of bdi is successfully collected */
cat stats
BdiWriteback:             4704 kB
BdiReclaimable:        1294496 kB
BdiDirtyThresh:      204208088 kB
DirtyThresh:         195259944 kB
BackgroundThresh:     32503588 kB
BdiDirtied:           48519296 kB
BdiWritten:           47225696 kB
BdiWriteBandwidth:     1173892 kBps
b_dirty:                     1
b_io:                        0
b_more_io:                   1
b_dirty_time:                0
bdi_list:                    1
state:                       1

/* per wb writeback info of bdi is collected */
cat /sys/kernel/debug/bdi/252:16/wb_stats
WbCgIno:                    1
WbWriteback:                0 kB
WbReclaimable:              0 kB
WbDirtyThresh:              0 kB
WbDirtied:                  0 kB
WbWritten:                  0 kB
WbWriteBandwidth:      102400 kBps
b_dirty:                    0
b_io:                       0
b_more_io:                  0
b_dirty_time:               0
state:                      1

WbCgIno:                 4208
WbWriteback:            59808 kB
WbReclaimable:         676480 kB
WbDirtyThresh:        6004624 kB
WbDirtied:           23348192 kB
WbWritten:           22614592 kB
WbWriteBandwidth:      593204 kBps
b_dirty:                    1
b_io:                       1
b_more_io:                  0
b_dirty_time:               0
state:                      7

WbCgIno:                 4249
WbWriteback:           144256 kB
WbReclaimable:         432096 kB
WbDirtyThresh:        6004344 kB
WbDirtied:           25727744 kB
WbWritten:           25154752 kB
WbWriteBandwidth:      577904 kBps
b_dirty:                    0
b_io:                       1
b_more_io:                  0
b_dirty_time:               0
state:                      7

The wb_monitor.py script output is as following:
./wb_monitor.py 252:16 -c
                  writeback  reclaimable   dirtied   written    avg_bw
252:16_1                  0            0         0         0    102400
252:16_4284             672       820064   9230368   8410304    685612
252:16_4325             896       819840  10491264   9671648    652348
252:16                 1568      1639904  19721632  18081952   1440360

                  writeback  reclaimable   dirtied   written    avg_bw
252:16_1                  0            0         0         0    102400
252:16_4284             672       820064   9230368   8410304    685612
252:16_4325             896       819840  10491264   9671648    652348
252:16                 1568      1639904  19721632  18081952   1440360
...


Kemeng Shi (5):
  writeback: collect stats of all wb of bdi in bdi_debug_stats_show
  writeback: support retrieving per group debug writeback stats of bdi
  writeback: fix build problems of "writeback: support retrieving per
    group debug writeback stats of bdi"
  writeback: add wb_monitor.py script to monitor writeback info on bdi
  writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages

 include/linux/writeback.h     |   1 +
 mm/backing-dev.c              | 177 +++++++++++++++++++++++++++++-----
 mm/page-writeback.c           |  27 +++++-
 tools/writeback/wb_monitor.py | 172 +++++++++++++++++++++++++++++++++
 4 files changed, 348 insertions(+), 29 deletions(-)
 create mode 100644 tools/writeback/wb_monitor.py

-- 
2.30.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/5] writeback: collect stats of all wb of bdi in bdi_debug_stats_show
  2024-04-23  3:46 [PATCH v5 0/5] Improve visibility of writeback Kemeng Shi
@ 2024-04-23  3:46 ` Kemeng Shi
  2024-04-23  3:46 ` [PATCH v5 2/5] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23  3:46 UTC (permalink / raw)
  To: akpm, willy, jack, bfoster, tj
  Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel

/sys/kernel/debug/bdi/xxx/stats is supposed to show writeback information
of whole bdi, but only writeback information of bdi in root cgroup is
collected. So writeback information in non-root cgroup are missing now.
To be more specific, considering following case:

/* create writeback cgroup */
cd /sys/fs/cgroup
echo "+memory +io" > cgroup.subtree_control
mkdir group1
cd group1
echo $$ > cgroup.procs
/* do writeback in cgroup */
fio -name test -filename=/dev/vdb ...
/* get writeback info of bdi */
cat /sys/kernel/debug/bdi/xxx/stats
The cat result unexpectedly implies that there is no writeback on target
bdi.

Fix this by collecting stats of all wb in bdi instead of only wb in
root cgroup.

Following domain hierarchy is tested:
                global domain (320G)
                /                 \
        cgroup domain1(10G)     cgroup domain2(10G)
                |                 |
bdi            wb1               wb2

/* all writeback info of bdi is successfully collected */
cat stats
BdiWriteback:             2912 kB
BdiReclaimable:        1598464 kB
BdiDirtyThresh:      167479028 kB
DirtyThresh:         195038532 kB
BackgroundThresh:     32466728 kB
BdiDirtied:           19141696 kB
BdiWritten:           17543456 kB
BdiWriteBandwidth:     1136172 kBps
b_dirty:                     2
b_io:                        0
b_more_io:                   1
b_dirty_time:                0
bdi_list:                    1
state:                       1

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 mm/backing-dev.c | 96 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 73 insertions(+), 23 deletions(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 5fa3666356f9..089146feb830 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -39,6 +39,19 @@ struct workqueue_struct *bdi_wq;
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
 
+struct wb_stats {
+	unsigned long nr_dirty;
+	unsigned long nr_io;
+	unsigned long nr_more_io;
+	unsigned long nr_dirty_time;
+	unsigned long nr_writeback;
+	unsigned long nr_reclaimable;
+	unsigned long nr_dirtied;
+	unsigned long nr_written;
+	unsigned long dirty_thresh;
+	unsigned long wb_thresh;
+};
+
 static struct dentry *bdi_debug_root;
 
 static void bdi_debug_init(void)
@@ -46,31 +59,68 @@ static void bdi_debug_init(void)
 	bdi_debug_root = debugfs_create_dir("bdi", NULL);
 }
 
-static int bdi_debug_stats_show(struct seq_file *m, void *v)
+static void collect_wb_stats(struct wb_stats *stats,
+			     struct bdi_writeback *wb)
 {
-	struct backing_dev_info *bdi = m->private;
-	struct bdi_writeback *wb = &bdi->wb;
-	unsigned long background_thresh;
-	unsigned long dirty_thresh;
-	unsigned long wb_thresh;
-	unsigned long nr_dirty, nr_io, nr_more_io, nr_dirty_time;
 	struct inode *inode;
 
-	nr_dirty = nr_io = nr_more_io = nr_dirty_time = 0;
 	spin_lock(&wb->list_lock);
 	list_for_each_entry(inode, &wb->b_dirty, i_io_list)
-		nr_dirty++;
+		stats->nr_dirty++;
 	list_for_each_entry(inode, &wb->b_io, i_io_list)
-		nr_io++;
+		stats->nr_io++;
 	list_for_each_entry(inode, &wb->b_more_io, i_io_list)
-		nr_more_io++;
+		stats->nr_more_io++;
 	list_for_each_entry(inode, &wb->b_dirty_time, i_io_list)
 		if (inode->i_state & I_DIRTY_TIME)
-			nr_dirty_time++;
+			stats->nr_dirty_time++;
 	spin_unlock(&wb->list_lock);
 
+	stats->nr_writeback += wb_stat(wb, WB_WRITEBACK);
+	stats->nr_reclaimable += wb_stat(wb, WB_RECLAIMABLE);
+	stats->nr_dirtied += wb_stat(wb, WB_DIRTIED);
+	stats->nr_written += wb_stat(wb, WB_WRITTEN);
+	stats->wb_thresh += wb_calc_thresh(wb, stats->dirty_thresh);
+}
+
+#ifdef CONFIG_CGROUP_WRITEBACK
+static void bdi_collect_stats(struct backing_dev_info *bdi,
+			      struct wb_stats *stats)
+{
+	struct bdi_writeback *wb;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
+		if (!wb_tryget(wb))
+			continue;
+
+		collect_wb_stats(stats, wb);
+		wb_put(wb);
+	}
+	rcu_read_unlock();
+}
+#else
+static void bdi_collect_stats(struct backing_dev_info *bdi,
+			      struct wb_stats *stats)
+{
+	collect_wb_stats(stats, &bdi->wb);
+}
+#endif
+
+static int bdi_debug_stats_show(struct seq_file *m, void *v)
+{
+	struct backing_dev_info *bdi = m->private;
+	unsigned long background_thresh;
+	unsigned long dirty_thresh;
+	struct wb_stats stats;
+	unsigned long tot_bw;
+
 	global_dirty_limits(&background_thresh, &dirty_thresh);
-	wb_thresh = wb_calc_thresh(wb, dirty_thresh);
+
+	memset(&stats, 0, sizeof(stats));
+	stats.dirty_thresh = dirty_thresh;
+	bdi_collect_stats(bdi, &stats);
+	tot_bw = atomic_long_read(&bdi->tot_write_bandwidth);
 
 	seq_printf(m,
 		   "BdiWriteback:       %10lu kB\n"
@@ -87,18 +137,18 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 		   "b_dirty_time:       %10lu\n"
 		   "bdi_list:           %10u\n"
 		   "state:              %10lx\n",
-		   (unsigned long) K(wb_stat(wb, WB_WRITEBACK)),
-		   (unsigned long) K(wb_stat(wb, WB_RECLAIMABLE)),
-		   K(wb_thresh),
+		   K(stats.nr_writeback),
+		   K(stats.nr_reclaimable),
+		   K(stats.wb_thresh),
 		   K(dirty_thresh),
 		   K(background_thresh),
-		   (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
-		   (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
-		   (unsigned long) K(wb->write_bandwidth),
-		   nr_dirty,
-		   nr_io,
-		   nr_more_io,
-		   nr_dirty_time,
+		   K(stats.nr_dirtied),
+		   K(stats.nr_written),
+		   K(tot_bw),
+		   stats.nr_dirty,
+		   stats.nr_io,
+		   stats.nr_more_io,
+		   stats.nr_dirty_time,
 		   !list_empty(&bdi->bdi_list), bdi->wb.state);
 
 	return 0;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 2/5] writeback: support retrieving per group debug writeback stats of bdi
  2024-04-23  3:46 [PATCH v5 0/5] Improve visibility of writeback Kemeng Shi
  2024-04-23  3:46 ` [PATCH v5 1/5] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
@ 2024-04-23  3:46 ` Kemeng Shi
  2024-04-23  3:46 ` [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi" Kemeng Shi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23  3:46 UTC (permalink / raw)
  To: akpm, willy, jack, bfoster, tj
  Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel

Add /sys/kernel/debug/bdi/xxx/wb_stats to show per group writeback stats
of bdi.

Following domain hierarchy is tested:
                global domain (320G)
                /                 \
        cgroup domain1(10G)     cgroup domain2(10G)
                |                 |
bdi            wb1               wb2

/* per wb writeback info of bdi is collected */
cat wb_stats
WbCgIno:                    1
WbWriteback:                0 kB
WbReclaimable:              0 kB
WbDirtyThresh:              0 kB
WbDirtied:                  0 kB
WbWritten:                  0 kB
WbWriteBandwidth:      102400 kBps
b_dirty:                    0
b_io:                       0
b_more_io:                  0
b_dirty_time:               0
state:                      1

WbCgIno:                 4091
WbWriteback:             1792 kB
WbReclaimable:         820512 kB
WbDirtyThresh:        6004692 kB
WbDirtied:            1820448 kB
WbWritten:             999488 kB
WbWriteBandwidth:      169020 kBps
b_dirty:                    0
b_io:                       0
b_more_io:                  1
b_dirty_time:               0
state:                      5

WbCgIno:                 4131
WbWriteback:             1120 kB
WbReclaimable:         820064 kB
WbDirtyThresh:        6004728 kB
WbDirtied:            1822688 kB
WbWritten:            1002400 kB
WbWriteBandwidth:      153520 kBps
b_dirty:                    0
b_io:                       0
b_more_io:                  1
b_dirty_time:               0
state:                      5

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 include/linux/writeback.h |  1 +
 mm/backing-dev.c          | 78 ++++++++++++++++++++++++++++++++++++++-
 mm/page-writeback.c       | 19 ++++++++++
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 9845cb62e40b..112d806ddbe4 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -355,6 +355,7 @@ int dirtytime_interval_handler(struct ctl_table *table, int write,
 
 void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty);
 unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
+unsigned long cgwb_calc_thresh(struct bdi_writeback *wb);
 
 void wb_update_bandwidth(struct bdi_writeback *wb);
 
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 089146feb830..6ecd11bdce6e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -155,19 +155,93 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
 }
 DEFINE_SHOW_ATTRIBUTE(bdi_debug_stats);
 
+static void wb_stats_show(struct seq_file *m, struct bdi_writeback *wb,
+			  struct wb_stats *stats)
+{
+
+	seq_printf(m,
+		   "WbCgIno:           %10lu\n"
+		   "WbWriteback:       %10lu kB\n"
+		   "WbReclaimable:     %10lu kB\n"
+		   "WbDirtyThresh:     %10lu kB\n"
+		   "WbDirtied:         %10lu kB\n"
+		   "WbWritten:         %10lu kB\n"
+		   "WbWriteBandwidth:  %10lu kBps\n"
+		   "b_dirty:           %10lu\n"
+		   "b_io:              %10lu\n"
+		   "b_more_io:         %10lu\n"
+		   "b_dirty_time:      %10lu\n"
+		   "state:             %10lx\n\n",
+		   cgroup_ino(wb->memcg_css->cgroup),
+		   K(stats->nr_writeback),
+		   K(stats->nr_reclaimable),
+		   K(stats->wb_thresh),
+		   K(stats->nr_dirtied),
+		   K(stats->nr_written),
+		   K(wb->avg_write_bandwidth),
+		   stats->nr_dirty,
+		   stats->nr_io,
+		   stats->nr_more_io,
+		   stats->nr_dirty_time,
+		   wb->state);
+}
+
+static int cgwb_debug_stats_show(struct seq_file *m, void *v)
+{
+	struct backing_dev_info *bdi = m->private;
+	unsigned long background_thresh;
+	unsigned long dirty_thresh;
+	struct bdi_writeback *wb;
+	struct wb_stats stats;
+
+	global_dirty_limits(&background_thresh, &dirty_thresh);
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
+		struct wb_stats stats = { .dirty_thresh = dirty_thresh };
+
+		if (!wb_tryget(wb))
+			continue;
+
+		collect_wb_stats(&stats, wb);
+
+		/*
+		 * Calculate thresh of wb in writeback cgroup which is min of
+		 * thresh in global domain and thresh in cgroup domain. Drop
+		 * rcu lock because cgwb_calc_thresh may sleep in
+		 * cgroup_rstat_flush. We can do so here because we have a ref.
+		 */
+		if (mem_cgroup_wb_domain(wb)) {
+			rcu_read_unlock();
+			stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb));
+			rcu_read_lock();
+		}
+
+		wb_stats_show(m, wb, &stats);
+
+		wb_put(wb);
+	}
+	rcu_read_unlock();
+
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(cgwb_debug_stats);
+
 static void bdi_debug_register(struct backing_dev_info *bdi, const char *name)
 {
 	bdi->debug_dir = debugfs_create_dir(name, bdi_debug_root);
 
 	debugfs_create_file("stats", 0444, bdi->debug_dir, bdi,
 			    &bdi_debug_stats_fops);
+	debugfs_create_file("wb_stats", 0444, bdi->debug_dir, bdi,
+			    &cgwb_debug_stats_fops);
 }
 
 static void bdi_debug_unregister(struct backing_dev_info *bdi)
 {
 	debugfs_remove_recursive(bdi->debug_dir);
 }
-#else
+#else /* CONFIG_DEBUG_FS */
 static inline void bdi_debug_init(void)
 {
 }
@@ -178,7 +252,7 @@ static inline void bdi_debug_register(struct backing_dev_info *bdi,
 static inline void bdi_debug_unregister(struct backing_dev_info *bdi)
 {
 }
-#endif
+#endif /* CONFIG_DEBUG_FS */
 
 static ssize_t read_ahead_kb_store(struct device *dev,
 				  struct device_attribute *attr,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3e19b87049db..3bb3bed102ef 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -892,6 +892,25 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh)
 	return __wb_calc_thresh(&gdtc);
 }
 
+unsigned long cgwb_calc_thresh(struct bdi_writeback *wb)
+{
+	struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB };
+	struct dirty_throttle_control mdtc = { MDTC_INIT(wb, &gdtc) };
+	unsigned long filepages = 0, headroom = 0, writeback = 0;
+
+	gdtc.avail = global_dirtyable_memory();
+	gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) +
+		     global_node_page_state(NR_WRITEBACK);
+
+	mem_cgroup_wb_stats(wb, &filepages, &headroom,
+			    &mdtc.dirty, &writeback);
+	mdtc.dirty += writeback;
+	mdtc_calc_avail(&mdtc, filepages, headroom);
+	domain_dirty_limits(&mdtc);
+
+	return __wb_calc_thresh(&mdtc);
+}
+
 /*
  *                           setpoint - dirty 3
  *        f(dirty) := 1.0 + (----------------)
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi"
  2024-04-23  3:46 [PATCH v5 0/5] Improve visibility of writeback Kemeng Shi
  2024-04-23  3:46 ` [PATCH v5 1/5] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
  2024-04-23  3:46 ` [PATCH v5 2/5] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
@ 2024-04-23  3:46 ` Kemeng Shi
  2024-04-23  3:53   ` Kemeng Shi
  2024-04-24 13:27   ` Johannes Weiner
  2024-04-23  3:46 ` [PATCH v5 4/5] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
  2024-04-23  3:46 ` [PATCH v5 5/5] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
  4 siblings, 2 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23  3:46 UTC (permalink / raw)
  To: akpm, willy, jack, bfoster, tj
  Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel

Fix two build problems:
1. implicit declaration of function 'cgroup_ino'.
2. unused variable 'stats'.

After this fix, No build problem is found when CGROUPS is disabled.
The wb_stats could be successfully retrieved when CGROUP_WRITEBACK is
disabled:
cat wb_stats
WbCgIno:                    1
WbWriteback:                0 kB
WbReclaimable:         685440 kB
WbDirtyThresh:      195530960 kB
WbDirtied:             691488 kB
WbWritten:               6048 kB
WbWriteBandwidth:      102400 kBps
b_dirty:                    2
b_io:                       0
b_more_io:                  0
b_dirty_time:               0
state:                      5

cat wb_stats
WbCgIno:                    1
WbWriteback:                0 kB
WbReclaimable:         818944 kB
WbDirtyThresh:      195527484 kB
WbDirtied:             824992 kB
WbWritten:               6048 kB
WbWriteBandwidth:      102400 kBps
b_dirty:                    2
b_io:                       0
b_more_io:                  0
b_dirty_time:               0
state:                      5

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 mm/backing-dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6ecd11bdce6e..e61bbb1bd622 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -172,7 +172,11 @@ static void wb_stats_show(struct seq_file *m, struct bdi_writeback *wb,
 		   "b_more_io:         %10lu\n"
 		   "b_dirty_time:      %10lu\n"
 		   "state:             %10lx\n\n",
+#ifdef CONFIG_CGROUP_WRITEBACK
 		   cgroup_ino(wb->memcg_css->cgroup),
+#else
+		   1ul,
+#endif
 		   K(stats->nr_writeback),
 		   K(stats->nr_reclaimable),
 		   K(stats->wb_thresh),
@@ -192,7 +196,6 @@ static int cgwb_debug_stats_show(struct seq_file *m, void *v)
 	unsigned long background_thresh;
 	unsigned long dirty_thresh;
 	struct bdi_writeback *wb;
-	struct wb_stats stats;
 
 	global_dirty_limits(&background_thresh, &dirty_thresh);
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 4/5] writeback: add wb_monitor.py script to monitor writeback info on bdi
  2024-04-23  3:46 [PATCH v5 0/5] Improve visibility of writeback Kemeng Shi
                   ` (2 preceding siblings ...)
  2024-04-23  3:46 ` [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi" Kemeng Shi
@ 2024-04-23  3:46 ` Kemeng Shi
  2024-04-23  3:46 ` [PATCH v5 5/5] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
  4 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23  3:46 UTC (permalink / raw)
  To: akpm, willy, jack, bfoster, tj
  Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel

Add wb_monitor.py script to monitor writeback information on backing dev
which makes it easier and more convenient to observe writeback behaviors
of running system.

The wb_monitor.py script is written based on wq_monitor.py.

Following domain hierarchy is tested:
                global domain (320G)
                /                 \
        cgroup domain1(10G)     cgroup domain2(10G)
                |                 |
bdi            wb1               wb2

The wb_monitor.py script output is as following:
./wb_monitor.py 252:16 -c
                  writeback  reclaimable   dirtied   written    avg_bw
252:16_1                  0            0         0         0    102400
252:16_4284             672       820064   9230368   8410304    685612
252:16_4325             896       819840  10491264   9671648    652348
252:16                 1568      1639904  19721632  18081952   1440360

                  writeback  reclaimable   dirtied   written    avg_bw
252:16_1                  0            0         0         0    102400
252:16_4284             672       820064   9230368   8410304    685612
252:16_4325             896       819840  10491264   9671648    652348
252:16                 1568      1639904  19721632  18081952   1440360
...

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Suggested-by: Tejun Heo <tj@kernel.org>
---
 tools/writeback/wb_monitor.py | 172 ++++++++++++++++++++++++++++++++++
 1 file changed, 172 insertions(+)
 create mode 100644 tools/writeback/wb_monitor.py

diff --git a/tools/writeback/wb_monitor.py b/tools/writeback/wb_monitor.py
new file mode 100644
index 000000000000..5e3591f1f9a9
--- /dev/null
+++ b/tools/writeback/wb_monitor.py
@@ -0,0 +1,172 @@
+#!/usr/bin/env drgn
+#
+# Copyright (C) 2024 Kemeng Shi <shikemeng@huaweicloud.com>
+# Copyright (C) 2024 Huawei Inc
+
+desc = """
+This is a drgn script based on wq_monitor.py to monitor writeback info on
+backing dev. For more info on drgn, visit https://github.com/osandov/drgn.
+
+  writeback(kB)     Amount of dirty pages are currently being written back to
+                    disk.
+
+  reclaimable(kB)   Amount of pages are currently reclaimable.
+
+  dirtied(kB)       Amount of pages have been dirtied.
+
+  wrttien(kB)       Amount of dirty pages have been written back to disk.
+
+  avg_wb(kBps)      Smoothly estimated write bandwidth of writing dirty pages
+                    back to disk.
+"""
+
+import signal
+import re
+import time
+import json
+
+import drgn
+from drgn.helpers.linux.list import list_for_each_entry
+
+import argparse
+parser = argparse.ArgumentParser(description=desc,
+                                 formatter_class=argparse.RawTextHelpFormatter)
+parser.add_argument('bdi', metavar='REGEX', nargs='*',
+                    help='Target backing device name patterns (all if empty)')
+parser.add_argument('-i', '--interval', metavar='SECS', type=float, default=1,
+                    help='Monitoring interval (0 to print once and exit)')
+parser.add_argument('-j', '--json', action='store_true',
+                    help='Output in json')
+parser.add_argument('-c', '--cgroup', action='store_true',
+                    help='show writeback of bdi in cgroup')
+args = parser.parse_args()
+
+bdi_list                = prog['bdi_list']
+
+WB_RECLAIMABLE          = prog['WB_RECLAIMABLE']
+WB_WRITEBACK            = prog['WB_WRITEBACK']
+WB_DIRTIED              = prog['WB_DIRTIED']
+WB_WRITTEN              = prog['WB_WRITTEN']
+NR_WB_STAT_ITEMS        = prog['NR_WB_STAT_ITEMS']
+
+PAGE_SHIFT              = prog['PAGE_SHIFT']
+
+def K(x):
+    return x << (PAGE_SHIFT - 10)
+
+class Stats:
+    def dict(self, now):
+        return { 'timestamp'            : now,
+                 'name'                 : self.name,
+                 'writeback'            : self.stats[WB_WRITEBACK],
+                 'reclaimable'          : self.stats[WB_RECLAIMABLE],
+                 'dirtied'              : self.stats[WB_DIRTIED],
+                 'written'              : self.stats[WB_WRITTEN],
+                 'avg_wb'               : self.avg_bw, }
+
+    def table_header_str():
+        return f'{"":>16} {"writeback":>10} {"reclaimable":>12} ' \
+                f'{"dirtied":>9} {"written":>9} {"avg_bw":>9}'
+
+    def table_row_str(self):
+        out = f'{self.name[-16:]:16} ' \
+              f'{self.stats[WB_WRITEBACK]:10} ' \
+              f'{self.stats[WB_RECLAIMABLE]:12} ' \
+              f'{self.stats[WB_DIRTIED]:9} ' \
+              f'{self.stats[WB_WRITTEN]:9} ' \
+              f'{self.avg_bw:9} '
+        return out
+
+    def show_header():
+        if Stats.table_fmt:
+            print()
+            print(Stats.table_header_str())
+
+    def show_stats(self):
+        if Stats.table_fmt:
+            print(self.table_row_str())
+        else:
+            print(self.dict(Stats.now))
+
+class WbStats(Stats):
+    def __init__(self, wb):
+        bdi_name = wb.bdi.dev_name.string_().decode()
+        # avoid to use bdi.wb.memcg_css which is only defined when
+        # CONFIG_CGROUP_WRITEBACK is enabled
+        if wb == wb.bdi.wb.address_of_():
+            ino = "1"
+        else:
+            ino = str(wb.memcg_css.cgroup.kn.id.value_())
+        self.name = bdi_name + '_' + ino
+
+        self.stats = [0] * NR_WB_STAT_ITEMS
+        for i in range(NR_WB_STAT_ITEMS):
+            if wb.stat[i].count >= 0:
+                self.stats[i] = int(K(wb.stat[i].count))
+            else:
+                self.stats[i] = 0
+
+        self.avg_bw = int(K(wb.avg_write_bandwidth))
+
+class BdiStats(Stats):
+    def __init__(self, bdi):
+        self.name = bdi.dev_name.string_().decode()
+        self.stats = [0] * NR_WB_STAT_ITEMS
+        self.avg_bw = 0
+
+    def collectStats(self, wb_stats):
+        for i in range(NR_WB_STAT_ITEMS):
+            self.stats[i] += wb_stats.stats[i]
+
+        self.avg_bw += wb_stats.avg_bw
+
+exit_req = False
+
+def sigint_handler(signr, frame):
+    global exit_req
+    exit_req = True
+
+def main():
+    # handle args
+    Stats.table_fmt = not args.json
+    interval = args.interval
+    cgroup = args.cgroup
+
+    re_str = None
+    if args.bdi:
+        for r in args.bdi:
+            if re_str is None:
+                re_str = r
+            else:
+                re_str += '|' + r
+
+    filter_re = re.compile(re_str) if re_str else None
+
+    # monitoring loop
+    signal.signal(signal.SIGINT, sigint_handler)
+
+    while not exit_req:
+        Stats.now = time.time()
+
+        Stats.show_header()
+        for bdi in list_for_each_entry('struct backing_dev_info', bdi_list.address_of_(), 'bdi_list'):
+            bdi_stats = BdiStats(bdi)
+            if filter_re and not filter_re.search(bdi_stats.name):
+                continue
+
+            for wb in list_for_each_entry('struct bdi_writeback', bdi.wb_list.address_of_(), 'bdi_node'):
+                wb_stats = WbStats(wb)
+                bdi_stats.collectStats(wb_stats)
+                if cgroup:
+                    wb_stats.show_stats()
+
+            bdi_stats.show_stats()
+            if cgroup and Stats.table_fmt:
+                print()
+
+        if interval == 0:
+            break
+        time.sleep(interval)
+
+if __name__ == "__main__":
+    main()
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 5/5] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages
  2024-04-23  3:46 [PATCH v5 0/5] Improve visibility of writeback Kemeng Shi
                   ` (3 preceding siblings ...)
  2024-04-23  3:46 ` [PATCH v5 4/5] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
@ 2024-04-23  3:46 ` Kemeng Shi
  4 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23  3:46 UTC (permalink / raw)
  To: akpm, willy, jack, bfoster, tj
  Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel

Commit 8d92890bd6b85 ("mm/writeback: discard NR_UNSTABLE_NFS, use
NR_WRITEBACK instead") removed NR_UNSTABLE_NFS and nr_reclaimable
only contains dirty page now.
Rename nr_reclaimable to nr_dirty properly.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 mm/page-writeback.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3bb3bed102ef..44df5c899a33 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1694,7 +1694,7 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
 	struct dirty_throttle_control * const mdtc = mdtc_valid(&mdtc_stor) ?
 						     &mdtc_stor : NULL;
 	struct dirty_throttle_control *sdtc;
-	unsigned long nr_reclaimable;	/* = file_dirty */
+	unsigned long nr_dirty;
 	long period;
 	long pause;
 	long max_pause;
@@ -1715,9 +1715,9 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
 		unsigned long m_thresh = 0;
 		unsigned long m_bg_thresh = 0;
 
-		nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
+		nr_dirty = global_node_page_state(NR_FILE_DIRTY);
 		gdtc->avail = global_dirtyable_memory();
-		gdtc->dirty = nr_reclaimable + global_node_page_state(NR_WRITEBACK);
+		gdtc->dirty = nr_dirty + global_node_page_state(NR_WRITEBACK);
 
 		domain_dirty_limits(gdtc);
 
@@ -1768,7 +1768,7 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
 		 * In normal mode, we start background writeout at the lower
 		 * background_thresh, to keep the amount of dirty memory low.
 		 */
-		if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh &&
+		if (!laptop_mode && nr_dirty > gdtc->bg_thresh &&
 		    !writeback_in_progress(wb))
 			wb_start_background_writeback(wb);
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi"
  2024-04-23  3:46 ` [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi" Kemeng Shi
@ 2024-04-23  3:53   ` Kemeng Shi
  2024-04-24 13:27   ` Johannes Weiner
  1 sibling, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-23  3:53 UTC (permalink / raw)
  To: akpm, willy, jack, bfoster, tj
  Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel,
	sj, sfr



on 4/23/2024 11:46 AM, Kemeng Shi wrote:
> Fix two build problems:
> 1. implicit declaration of function 'cgroup_ino'.
> 2. unused variable 'stats'.
> 
> After this fix, No build problem is found when CGROUPS is disabled.
> The wb_stats could be successfully retrieved when CGROUP_WRITEBACK is
> disabled:
> cat wb_stats
> WbCgIno:                    1
> WbWriteback:                0 kB
> WbReclaimable:         685440 kB
> WbDirtyThresh:      195530960 kB
> WbDirtied:             691488 kB
> WbWritten:               6048 kB
> WbWriteBandwidth:      102400 kBps
> b_dirty:                    2
> b_io:                       0
> b_more_io:                  0
> b_dirty_time:               0
> state:                      5
> 
> cat wb_stats
> WbCgIno:                    1
> WbWriteback:                0 kB
> WbReclaimable:         818944 kB
> WbDirtyThresh:      195527484 kB
> WbDirtied:             824992 kB
> WbWritten:               6048 kB
> WbWriteBandwidth:      102400 kBps
> b_dirty:                    2
> b_io:                       0
> b_more_io:                  0
> b_dirty_time:               0
> state:                      5
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reported-by: SeongJae Park <sj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Sorry for missing reported-by tags.

> ---
>  mm/backing-dev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 6ecd11bdce6e..e61bbb1bd622 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -172,7 +172,11 @@ static void wb_stats_show(struct seq_file *m, struct bdi_writeback *wb,
>  		   "b_more_io:         %10lu\n"
>  		   "b_dirty_time:      %10lu\n"
>  		   "state:             %10lx\n\n",
> +#ifdef CONFIG_CGROUP_WRITEBACK
>  		   cgroup_ino(wb->memcg_css->cgroup),
> +#else
> +		   1ul,
> +#endif
>  		   K(stats->nr_writeback),
>  		   K(stats->nr_reclaimable),
>  		   K(stats->wb_thresh),
> @@ -192,7 +196,6 @@ static int cgwb_debug_stats_show(struct seq_file *m, void *v)
>  	unsigned long background_thresh;
>  	unsigned long dirty_thresh;
>  	struct bdi_writeback *wb;
> -	struct wb_stats stats;
>  
>  	global_dirty_limits(&background_thresh, &dirty_thresh);
>  
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi"
  2024-04-23  3:46 ` [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi" Kemeng Shi
  2024-04-23  3:53   ` Kemeng Shi
@ 2024-04-24 13:27   ` Johannes Weiner
  2024-04-25  1:22     ` Kemeng Shi
  1 sibling, 1 reply; 9+ messages in thread
From: Johannes Weiner @ 2024-04-24 13:27 UTC (permalink / raw)
  To: Kemeng Shi
  Cc: akpm, willy, jack, bfoster, tj, dsterba, mjguzik, dhowells,
	linux-kernel, linux-mm, linux-fsdevel

Hi Kemeng,

On Tue, Apr 23, 2024 at 11:46:41AM +0800, Kemeng Shi wrote:
> Fix two build problems:
> 1. implicit declaration of function 'cgroup_ino'.

I just ran into this as well, with defconfig on mm-everything:

/home/hannes/src/linux/linux/mm/backing-dev.c: In function 'wb_stats_show':
/home/hannes/src/linux/linux/mm/backing-dev.c:175:33: error: 'struct bdi_writeback' has no member named 'memcg_css'
  175 |                    cgroup_ino(wb->memcg_css->cgroup),
      |                                 ^~
make[3]: *** [/home/hannes/src/linux/linux/scripts/Makefile.build:244: mm/backing-dev.o] Error 1

> ---
>  mm/backing-dev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 6ecd11bdce6e..e61bbb1bd622 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -172,7 +172,11 @@ static void wb_stats_show(struct seq_file *m, struct bdi_writeback *wb,
>  		   "b_more_io:         %10lu\n"
>  		   "b_dirty_time:      %10lu\n"
>  		   "state:             %10lx\n\n",
> +#ifdef CONFIG_CGROUP_WRITEBACK
>  		   cgroup_ino(wb->memcg_css->cgroup),
> +#else
> +		   1ul,
> +#endif
>  		   K(stats->nr_writeback),
>  		   K(stats->nr_reclaimable),
>  		   K(stats->wb_thresh),
> @@ -192,7 +196,6 @@ static int cgwb_debug_stats_show(struct seq_file *m, void *v)
>  	unsigned long background_thresh;
>  	unsigned long dirty_thresh;
>  	struct bdi_writeback *wb;
> -	struct wb_stats stats;
>  
>  	global_dirty_limits(&background_thresh, &dirty_thresh);

The fix looks right to me, but it needs to be folded into the previous
patch. No patch should knowingly introduce an issue that is fixed
later on. This will break bisection.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi"
  2024-04-24 13:27   ` Johannes Weiner
@ 2024-04-25  1:22     ` Kemeng Shi
  0 siblings, 0 replies; 9+ messages in thread
From: Kemeng Shi @ 2024-04-25  1:22 UTC (permalink / raw)
  To: Johannes Weiner, akpm
  Cc: willy, jack, bfoster, tj, dsterba, mjguzik, dhowells,
	linux-kernel, linux-mm, linux-fsdevel


Hi Johannes,
on 4/24/2024 9:27 PM, Johannes Weiner wrote:
> Hi Kemeng,
> 
> On Tue, Apr 23, 2024 at 11:46:41AM +0800, Kemeng Shi wrote:
>> Fix two build problems:
>> 1. implicit declaration of function 'cgroup_ino'.
> 
> I just ran into this as well, with defconfig on mm-everything:
Sorry for this.
> 
> /home/hannes/src/linux/linux/mm/backing-dev.c: In function 'wb_stats_show':
> /home/hannes/src/linux/linux/mm/backing-dev.c:175:33: error: 'struct bdi_writeback' has no member named 'memcg_css'
>   175 |                    cgroup_ino(wb->memcg_css->cgroup),
>       |                                 ^~
> make[3]: *** [/home/hannes/src/linux/linux/scripts/Makefile.build:244: mm/backing-dev.o] Error 1
> 
>> ---
>>  mm/backing-dev.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
>> index 6ecd11bdce6e..e61bbb1bd622 100644
>> --- a/mm/backing-dev.c
>> +++ b/mm/backing-dev.c
>> @@ -172,7 +172,11 @@ static void wb_stats_show(struct seq_file *m, struct bdi_writeback *wb,
>>  		   "b_more_io:         %10lu\n"
>>  		   "b_dirty_time:      %10lu\n"
>>  		   "state:             %10lx\n\n",
>> +#ifdef CONFIG_CGROUP_WRITEBACK
>>  		   cgroup_ino(wb->memcg_css->cgroup),
>> +#else
>> +		   1ul,
>> +#endif
>>  		   K(stats->nr_writeback),
>>  		   K(stats->nr_reclaimable),
>>  		   K(stats->wb_thresh),
>> @@ -192,7 +196,6 @@ static int cgwb_debug_stats_show(struct seq_file *m, void *v)
>>  	unsigned long background_thresh;
>>  	unsigned long dirty_thresh;
>>  	struct bdi_writeback *wb;
>> -	struct wb_stats stats;
>>  
>>  	global_dirty_limits(&background_thresh, &dirty_thresh);
> 
> The fix looks right to me, but it needs to be folded into the previous
> patch. No patch should knowingly introduce an issue that is fixed
> later on. This will break bisection.
As I'm not sure if previous patch is already applied to tree, so I
make this fix a individual patch and mentioned in cover letter that
this could be folded if previous patch is not in tree or this could
be applied individually to fix the introduced issue. As Androw told
me that little fixups would be preferred instead of entire resend in
current stage, I guess a new series with this patch foled should not
be necessary. If a new series is still needed, please let me konw.
I would like to it.

Thanks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-25  1:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-23  3:46 [PATCH v5 0/5] Improve visibility of writeback Kemeng Shi
2024-04-23  3:46 ` [PATCH v5 1/5] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
2024-04-23  3:46 ` [PATCH v5 2/5] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
2024-04-23  3:46 ` [PATCH v5 3/5] writeback: fix build problems of "writeback: support retrieving per group debug writeback stats of bdi" Kemeng Shi
2024-04-23  3:53   ` Kemeng Shi
2024-04-24 13:27   ` Johannes Weiner
2024-04-25  1:22     ` Kemeng Shi
2024-04-23  3:46 ` [PATCH v5 4/5] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
2024-04-23  3:46 ` [PATCH v5 5/5] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).