LKML Archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop
       [not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com>
@ 2021-04-03  0:44 ` Andrew Morton
  2021-04-05 21:59 ` Alexey Avramov
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2021-04-03  0:44 UTC (permalink / raw)
  To: Stillinux
  Cc: linux-mm, linux-kernel, liuzhengyuan, liuyun01, Johannes Weiner,
	Hugh Dickins

On Fri, 2 Apr 2021 15:03:37 +0800 Stillinux <stillinux@gmail.com> wrote:

> In the case of high system memory and load pressure, we ran ltp test
> and found that the system was stuck, the direct memory reclaim was
> all stuck in io_schedule, the waiting request was stuck in the blk_plug
> flow of one process, and this process fell into an infinite loop.
> not do the action of brushing out the request.
> 
> The call flow of this process is swap_cluster_readahead.
> Use blk_start/finish_plug for blk_plug operation,
> flow swap_cluster_readahead->__read_swap_cache_async->swapcache_prepare.
> When swapcache_prepare return -EEXIST, it will fall into an infinite loop,
> even if cond_resched is called, but according to the schedule,
> sched_submit_work will be based on tsk->state, and will not flash out
> the blk_plug request, so will hang io, causing the overall system  hang.
> 
> For the first time involving the swap part, there is no good way to fix
> the problem from the fundamental problem. In order to solve the
> engineering situation, we chose to make swap_cluster_readahead aware of
> the memory pressure situation as soon as possible, and do io_schedule to
> flush out the blk_plug request, thereby changing the allocation flag in
> swap_readpage to GFP_NOIO , No longer do the memory reclaim of flush io.
> Although system operating normally, but not the most fundamental way.
> 

Thanks.

I'm not understanding why swapcache_prepare() repeatedly returns
-EEXIST in this situation?

And how does the switch to GFP_NOIO fix this?  Simply by avoiding
direct reclaim altogether?

> ---
>  mm/page_io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_io.c b/mm/page_io.c
> index c493ce9ebcf5..87392ffabb12 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -403,7 +403,7 @@ int swap_readpage(struct page *page, bool synchronous)
>  	}
> 
>  	ret = 0;
> -	bio = bio_alloc(GFP_KERNEL, 1);
> +	bio = bio_alloc(GFP_NOIO, 1);
>  	bio_set_dev(bio, sis->bdev);
>  	bio->bi_opf = REQ_OP_READ;
>  	bio->bi_iter.bi_sector = swap_page_sector(page);


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop
       [not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com>
  2021-04-03  0:44 ` [RFC PATCH] mm/swap: fix system stuck due to infinite loop Andrew Morton
@ 2021-04-05 21:59 ` Alexey Avramov
  2021-04-06  0:15   ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot
  2021-04-06  1:16   ` kernel test robot
  1 sibling, 2 replies; 4+ messages in thread
From: Alexey Avramov @ 2021-04-05 21:59 UTC (permalink / raw)
  To: Stillinux; +Cc: akpm, linux-mm, linux-kernel, liuzhengyuan, liuyun01

> In the case of high system memory and load pressure, we ran ltp test
> and found that the system was stuck, the direct memory reclaim was
> all stuck in io_schedule

> For the first time involving the swap part, there is no good way to fix
> the problem

The solution is protecting the clean file pages.

Look at this:

> On ChromiumOS, we do not use swap. When memory is low, the only 
> way to free memory is to reclaim pages from the file list. This 
> results in a lot of thrashing under low memory conditions. We see 
> the system become unresponsive for minutes before it eventually OOMs. 
> We also see very slow browser tab switching under low memory. Instead 
> of an unresponsive system, we'd really like the kernel to OOM as soon 
> as it starts to thrash. If it can't keep the working set in memory, 
> then OOM. Losing one of many tabs is a better behaviour for the user 
> than an unresponsive system.

> This patch create a new sysctl, min_filelist_kbytes, which disables 
> reclaim of file-backed pages when when there are less than min_filelist_bytes 
> worth of such pages in the cache. This tunable is handy for low memory 
> systems using solid-state storage where interactive response is more important 
> than not OOMing.

> With this patch and min_filelist_kbytes set to 50000, I see very little block 
> layer activity during low memory. The system stays responsive under low 
> memory and browser tab switching is fast. Eventually, a process a gets killed 
> by OOM. Without this patch, the system gets wedged for minutes before it 
> eventually OOMs.
https://lore.kernel.org/patchwork/patch/222042/

This patch can almost completely eliminate thrashing under memory pressure.

Effects
- Improving system responsiveness under low-memory conditions;
- Improving performans in I/O bound tasks under memory pressure;
- OOM killer comes faster (with hard protection);
- Fast system reclaiming after OOM.

Read more: https://github.com/hakavlad/le9-patch

The patch:

From 371e3e5290652e97d5279d8cd215cd356c1fb47b Mon Sep 17 00:00:00 2001
From: Alexey Avramov <hakavlad@inbox.lv>
Date: Mon, 5 Apr 2021 01:53:26 +0900
Subject: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified
 amount of clean file cache

The kernel does not have a mechanism for targeted protection of clean
file pages (CFP). A certain amount of the CFP is required by the userspace
for normal operation. First of all, you need a cache of shared libraries
and executable files. If the volume of the CFP cache falls below a certain
level, thrashing and even livelock occurs.

Protection of CFP may be used to prevent thrashing and reducing I/O under
memory pressure. Hard protection of CFP may be used to avoid high latency
and prevent livelock in near-OOM conditions. The patch provides sysctl
knobs for protecting the specified amount of clean file cache under memory
pressure.

The vm.clean_low_kbytes sysctl knob provides *best-effort* protection of
CFP. The CFP on the current node won't be reclaimed uder memory pressure
when their volume is below vm.clean_low_kbytes *unless* we threaten to OOM
or have no swap space or vm.swappiness=0. Setting it to a high value may
result in a early eviction of anonymous pages into the swap space by
attempting to hold the protected amount of clean file pages in memory. The
default value is defined by CONFIG_CLEAN_LOW_KBYTES (suggested 0 in
Kconfig).

The vm.clean_min_kbytes sysctl knob provides *hard* protection of CFP. The
CFP on the current node won't be reclaimed under memory pressure when their
volume is below vm.clean_min_kbytes. Setting it to a high value may result
in a early out-of-memory condition due to the inability to reclaim the
protected amount of CFP when other types of pages cannot be reclaimed. The
default value is defined by CONFIG_CLEAN_MIN_KBYTES (suggested 0 in
Kconfig).

Reported-by: Artem S. Tashkinov <aros@gmx.com>
Signed-off-by: Alexey Avramov <hakavlad@inbox.lv>
---
 Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++++++++++
 include/linux/mm.h                      |  3 ++
 kernel/sysctl.c                         | 14 ++++++++
 mm/Kconfig                              | 35 +++++++++++++++++++
 mm/vmscan.c                             | 59 +++++++++++++++++++++++++++++++++
 5 files changed, 148 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index f455fa00c..5d5ddfc85 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -26,6 +26,8 @@ Currently, these files are in /proc/sys/vm:

 - admin_reserve_kbytes
 - block_dump
+- clean_low_kbytes
+- clean_min_kbytes
 - compact_memory
 - compaction_proactiveness
 - compact_unevictable_allowed
@@ -113,6 +115,41 @@ block_dump enables block I/O debugging when set to a nonzero value. More
 information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.


+clean_low_kbytes
+=====================
+
+This knob provides *best-effort* protection of clean file pages. The clean file
+pages on the current node won't be reclaimed uder memory pressure when their
+volume is below vm.clean_low_kbytes *unless* we threaten to OOM or have no
+swap space or vm.swappiness=0.
+
+Protection of clean file pages may be used to prevent thrashing and
+reducing I/O under low-memory conditions.
+
+Setting it to a high value may result in a early eviction of anonymous pages
+into the swap space by attempting to hold the protected amount of clean file
+pages in memory.
+
+The default value is defined by CONFIG_CLEAN_LOW_KBYTES.
+
+
+clean_min_kbytes
+=====================
+
+This knob provides *hard* protection of clean file pages. The clean file pages
+on the current node won't be reclaimed under memory pressure when their volume
+is below vm.clean_min_kbytes.
+
+Hard protection of clean file pages may be used to avoid high latency and
+prevent livelock in near-OOM conditions.
+
+Setting it to a high value may result in a early out-of-memory condition due to
+the inability to reclaim the protected amount of clean file pages when other
+types of pages cannot be reclaimed.
+
+The default value is defined by CONFIG_CLEAN_MIN_KBYTES.
+
+
 compact_memory
 ==============

diff --git a/include/linux/mm.h b/include/linux/mm.h
index db6ae4d3f..7799f1555 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -202,6 +202,9 @@ static inline void __mm_zero_struct_page(struct page *page)

 extern int sysctl_max_map_count;

+extern unsigned long sysctl_clean_low_kbytes;
+extern unsigned long sysctl_clean_min_kbytes;
+
 extern unsigned long sysctl_user_reserve_kbytes;
 extern unsigned long sysctl_admin_reserve_kbytes;

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index afad08596..854b311cd 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3083,6 +3083,20 @@ static struct ctl_table vm_table[] = {
 	},
 #endif
 	{
+		.procname	= "clean_low_kbytes",
+		.data		= &sysctl_clean_low_kbytes,
+		.maxlen		= sizeof(sysctl_clean_low_kbytes),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_minmax,
+	},
+	{
+		.procname	= "clean_min_kbytes",
+		.data		= &sysctl_clean_min_kbytes,
+		.maxlen		= sizeof(sysctl_clean_min_kbytes),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_minmax,
+	},
+	{
 		.procname	= "user_reserve_kbytes",
 		.data		= &sysctl_user_reserve_kbytes,
 		.maxlen		= sizeof(sysctl_user_reserve_kbytes),
diff --git a/mm/Kconfig b/mm/Kconfig
index 390165ffb..3915c71e1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -122,6 +122,41 @@ config SPARSEMEM_VMEMMAP
 	  pfn_to_page and page_to_pfn operations.  This is the most
 	  efficient option when sufficient kernel resources are available.

+config CLEAN_LOW_KBYTES
+	int "Default value for vm.clean_low_kbytes"
+	depends on SYSCTL
+	default "0"
+	help
+	  The vm.clean_file_low_kbytes sysctl knob provides *best-effort*
+	  protection of clean file pages. The clean file pages on the current
+	  node won't be reclaimed uder memory pressure when their volume is
+	  below vm.clean_low_kbytes *unless* we threaten to OOM or have
+	  no swap space or vm.swappiness=0.
+
+	  Protection of clean file pages may be used to prevent thrashing and
+	  reducing I/O under low-memory conditions.
+
+	  Setting it to a high value may result in a early eviction of anonymous
+	  pages into the swap space by attempting to hold the protected amount of
+	  clean file pages in memory.
+
+config CLEAN_MIN_KBYTES
+	int "Default value for vm.clean_min_kbytes"
+	depends on SYSCTL
+	default "0"
+	help
+	  The vm.clean_file_min_kbytes sysctl knob provides *hard* protection
+	  of clean file pages. The clean file pages on the current node won't be
+	  reclaimed under memory pressure when their volume is below
+	  vm.clean_min_kbytes.
+
+	  Hard protection of clean file pages may be used to avoid high latency and
+	  prevent livelock in near-OOM conditions.
+
+	  Setting it to a high value may result in a early out-of-memory condition
+	  due to the inability to reclaim the protected amount of clean file pages
+	  when other types of pages cannot be reclaimed.
+
 config HAVE_MEMBLOCK_PHYS_MAP
 	bool

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7b4e31eac..77e98c43e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -120,6 +120,19 @@ struct scan_control {
 	/* The file pages on the current node are dangerously low */
 	unsigned int file_is_tiny:1;

+	/*
+	 * The clean file pages on the current node won't be reclaimed when
+	 * their volume is below vm.clean_low_kbytes *unless* we threaten
+	 * to OOM or have no swap space or vm.swappiness=0.
+	 */
+	unsigned int clean_below_low:1;
+
+	/*
+	 * The clean file pages on the current node won't be reclaimed when
+	 * their volume is below vm.clean_min_kbytes.
+	 */
+	unsigned int clean_below_min:1;
+
 	/* Allocation order */
 	s8 order;

@@ -166,6 +179,17 @@ struct scan_control {
 #define prefetchw_prev_lru_page(_page, _base, _field) do { } while (0)
 #endif

+#if CONFIG_CLEAN_LOW_KBYTES < 0
+#error "CONFIG_CLEAN_LOW_KBYTES must be >= 0"
+#endif
+
+#if CONFIG_CLEAN_MIN_KBYTES < 0
+#error "CONFIG_CLEAN_MIN_KBYTES must be >= 0"
+#endif
+
+unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES;
+unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES;
+
 /*
  * From 0 .. 200.  Higher means more swappy.
  */
@@ -2283,6 +2307,16 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	}

 	/*
+	 * Force-scan anon if clean file pages is under vm.clean_min_kbytes
+	 * or vm.clean_low_kbytes (unless the swappiness setting
+	 * disagrees with swapping).
+	 */
+	if ((sc->clean_below_low || sc->clean_below_min) && swappiness) {
+		scan_balance = SCAN_ANON;
+		goto out;
+	}
+
+	/*
 	 * If there is enough inactive page cache, we do not reclaim
 	 * anything from the anonymous working right now.
 	 */
@@ -2418,6 +2452,13 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 			BUG();
 		}

+		/*
+		 * Don't reclaim clean file pages when their volume is below
+		 * vm.clean_min_kbytes.
+		 */
+		if (file && sc->clean_below_min)
+			scan = 0;
+
 		nr[lru] = scan;
 	}
 }
@@ -2768,6 +2809,24 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 			anon >> sc->priority;
 	}

+	if (sysctl_clean_low_kbytes || sysctl_clean_min_kbytes) {
+		unsigned long reclaimable_file, dirty, clean;
+
+		reclaimable_file =
+			node_page_state(pgdat, NR_ACTIVE_FILE) +
+			node_page_state(pgdat, NR_INACTIVE_FILE) +
+			node_page_state(pgdat, NR_ISOLATED_FILE);
+		dirty = node_page_state(pgdat, NR_FILE_DIRTY);
+		if (reclaimable_file > dirty)
+			clean = (reclaimable_file - dirty) << (PAGE_SHIFT - 10);
+
+		sc->clean_below_low = clean < sysctl_clean_low_kbytes;
+		sc->clean_below_min = clean < sysctl_clean_min_kbytes;
+	} else {
+		sc->clean_below_low = false;
+		sc->clean_below_min = false;
+	}
+
 	shrink_node_memcgs(pgdat, sc);

 	if (reclaim_state) {
--
2.11.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified
  2021-04-05 21:59 ` Alexey Avramov
@ 2021-04-06  0:15   ` kernel test robot
  2021-04-06  1:16   ` kernel test robot
  1 sibling, 0 replies; 4+ messages in thread
From: kernel test robot @ 2021-04-06  0:15 UTC (permalink / raw)
  To: Alexey Avramov, Stillinux
  Cc: kbuild-all, akpm, linux-mm, linux-kernel, liuzhengyuan, liuyun01

[-- Attachment #1: Type: text/plain, Size: 3048 bytes --]

Hi Alexey,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v5.12-rc6 next-20210401]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486
config: parisc-randconfig-m031-20210405 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a5eeb8d197a8e10c333422e9cc0f2c7d976a3426
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034
        git checkout a5eeb8d197a8e10c333422e9cc0f2c7d976a3426
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> mm/vmscan.c:180:5: warning: "CONFIG_CLEAN_LOW_KBYTES" is not defined, evaluates to 0 [-Wundef]
     180 | #if CONFIG_CLEAN_LOW_KBYTES < 0
         |     ^~~~~~~~~~~~~~~~~~~~~~~
>> mm/vmscan.c:184:5: warning: "CONFIG_CLEAN_MIN_KBYTES" is not defined, evaluates to 0 [-Wundef]
     184 | #if CONFIG_CLEAN_MIN_KBYTES < 0
         |     ^~~~~~~~~~~~~~~~~~~~~~~
>> mm/vmscan.c:188:55: error: 'CONFIG_CLEAN_LOW_KBYTES' undeclared here (not in a function)
     188 | unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES;
         |                                                       ^~~~~~~~~~~~~~~~~~~~~~~
>> mm/vmscan.c:189:55: error: 'CONFIG_CLEAN_MIN_KBYTES' undeclared here (not in a function)
     189 | unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES;
         |                                                       ^~~~~~~~~~~~~~~~~~~~~~~


vim +/CONFIG_CLEAN_LOW_KBYTES +188 mm/vmscan.c

   179	
 > 180	#if CONFIG_CLEAN_LOW_KBYTES < 0
   181	#error "CONFIG_CLEAN_LOW_KBYTES must be >= 0"
   182	#endif
   183	
 > 184	#if CONFIG_CLEAN_MIN_KBYTES < 0
   185	#error "CONFIG_CLEAN_MIN_KBYTES must be >= 0"
   186	#endif
   187	
 > 188	unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES;
 > 189	unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES;
   190	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26587 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified
  2021-04-05 21:59 ` Alexey Avramov
  2021-04-06  0:15   ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot
@ 2021-04-06  1:16   ` kernel test robot
  1 sibling, 0 replies; 4+ messages in thread
From: kernel test robot @ 2021-04-06  1:16 UTC (permalink / raw)
  To: Alexey Avramov, Stillinux
  Cc: kbuild-all, clang-built-linux, akpm, linux-mm, linux-kernel,
	liuzhengyuan, liuyun01

[-- Attachment #1: Type: text/plain, Size: 16131 bytes --]

Hi Alexey,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.12-rc6 next-20210401]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486
config: s390-randconfig-r006-20210405 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project a46f59a747a7273cc439efaf3b4f98d8b63d2f20)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/0day-ci/linux/commit/a5eeb8d197a8e10c333422e9cc0f2c7d976a3426
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034
        git checkout a5eeb8d197a8e10c333422e9cc0f2c7d976a3426
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from mm/vmscan.c:20:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:22:
   In file included from include/linux/writeback.h:14:
   In file included from include/linux/blk-cgroup.h:23:
   In file included from include/linux/blkdev.h:26:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from mm/vmscan.c:20:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:22:
   In file included from include/linux/writeback.h:14:
   In file included from include/linux/blk-cgroup.h:23:
   In file included from include/linux/blkdev.h:26:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from mm/vmscan.c:20:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:22:
   In file included from include/linux/writeback.h:14:
   In file included from include/linux/blk-cgroup.h:23:
   In file included from include/linux/blkdev.h:26:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:80:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> mm/vmscan.c:2819:7: warning: variable 'clean' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                   if (reclaimable_file > dirty)
                       ^~~~~~~~~~~~~~~~~~~~~~~~
   mm/vmscan.c:2822:25: note: uninitialized use occurs here
                   sc->clean_below_low = clean < sysctl_clean_low_kbytes;
                                         ^~~~~
   mm/vmscan.c:2819:3: note: remove the 'if' if its condition is always true
                   if (reclaimable_file > dirty)
                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   mm/vmscan.c:2812:47: note: initialize the variable 'clean' to silence this warning
                   unsigned long reclaimable_file, dirty, clean;
                                                               ^
                                                                = 0
   13 warnings generated.


vim +2819 mm/vmscan.c

  2706	
  2707	static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
  2708	{
  2709		struct reclaim_state *reclaim_state = current->reclaim_state;
  2710		unsigned long nr_reclaimed, nr_scanned;
  2711		struct lruvec *target_lruvec;
  2712		bool reclaimable = false;
  2713		unsigned long file;
  2714	
  2715		target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
  2716	
  2717	again:
  2718		memset(&sc->nr, 0, sizeof(sc->nr));
  2719	
  2720		nr_reclaimed = sc->nr_reclaimed;
  2721		nr_scanned = sc->nr_scanned;
  2722	
  2723		/*
  2724		 * Determine the scan balance between anon and file LRUs.
  2725		 */
  2726		spin_lock_irq(&target_lruvec->lru_lock);
  2727		sc->anon_cost = target_lruvec->anon_cost;
  2728		sc->file_cost = target_lruvec->file_cost;
  2729		spin_unlock_irq(&target_lruvec->lru_lock);
  2730	
  2731		/*
  2732		 * Target desirable inactive:active list ratios for the anon
  2733		 * and file LRU lists.
  2734		 */
  2735		if (!sc->force_deactivate) {
  2736			unsigned long refaults;
  2737	
  2738			refaults = lruvec_page_state(target_lruvec,
  2739					WORKINGSET_ACTIVATE_ANON);
  2740			if (refaults != target_lruvec->refaults[0] ||
  2741				inactive_is_low(target_lruvec, LRU_INACTIVE_ANON))
  2742				sc->may_deactivate |= DEACTIVATE_ANON;
  2743			else
  2744				sc->may_deactivate &= ~DEACTIVATE_ANON;
  2745	
  2746			/*
  2747			 * When refaults are being observed, it means a new
  2748			 * workingset is being established. Deactivate to get
  2749			 * rid of any stale active pages quickly.
  2750			 */
  2751			refaults = lruvec_page_state(target_lruvec,
  2752					WORKINGSET_ACTIVATE_FILE);
  2753			if (refaults != target_lruvec->refaults[1] ||
  2754			    inactive_is_low(target_lruvec, LRU_INACTIVE_FILE))
  2755				sc->may_deactivate |= DEACTIVATE_FILE;
  2756			else
  2757				sc->may_deactivate &= ~DEACTIVATE_FILE;
  2758		} else
  2759			sc->may_deactivate = DEACTIVATE_ANON | DEACTIVATE_FILE;
  2760	
  2761		/*
  2762		 * If we have plenty of inactive file pages that aren't
  2763		 * thrashing, try to reclaim those first before touching
  2764		 * anonymous pages.
  2765		 */
  2766		file = lruvec_page_state(target_lruvec, NR_INACTIVE_FILE);
  2767		if (file >> sc->priority && !(sc->may_deactivate & DEACTIVATE_FILE))
  2768			sc->cache_trim_mode = 1;
  2769		else
  2770			sc->cache_trim_mode = 0;
  2771	
  2772		/*
  2773		 * Prevent the reclaimer from falling into the cache trap: as
  2774		 * cache pages start out inactive, every cache fault will tip
  2775		 * the scan balance towards the file LRU.  And as the file LRU
  2776		 * shrinks, so does the window for rotation from references.
  2777		 * This means we have a runaway feedback loop where a tiny
  2778		 * thrashing file LRU becomes infinitely more attractive than
  2779		 * anon pages.  Try to detect this based on file LRU size.
  2780		 */
  2781		if (!cgroup_reclaim(sc)) {
  2782			unsigned long total_high_wmark = 0;
  2783			unsigned long free, anon;
  2784			int z;
  2785	
  2786			free = sum_zone_node_page_state(pgdat->node_id, NR_FREE_PAGES);
  2787			file = node_page_state(pgdat, NR_ACTIVE_FILE) +
  2788				   node_page_state(pgdat, NR_INACTIVE_FILE);
  2789	
  2790			for (z = 0; z < MAX_NR_ZONES; z++) {
  2791				struct zone *zone = &pgdat->node_zones[z];
  2792				if (!managed_zone(zone))
  2793					continue;
  2794	
  2795				total_high_wmark += high_wmark_pages(zone);
  2796			}
  2797	
  2798			/*
  2799			 * Consider anon: if that's low too, this isn't a
  2800			 * runaway file reclaim problem, but rather just
  2801			 * extreme pressure. Reclaim as per usual then.
  2802			 */
  2803			anon = node_page_state(pgdat, NR_INACTIVE_ANON);
  2804	
  2805			sc->file_is_tiny =
  2806				file + free <= total_high_wmark &&
  2807				!(sc->may_deactivate & DEACTIVATE_ANON) &&
  2808				anon >> sc->priority;
  2809		}
  2810	
  2811		if (sysctl_clean_low_kbytes || sysctl_clean_min_kbytes) {
  2812			unsigned long reclaimable_file, dirty, clean;
  2813	
  2814			reclaimable_file =
  2815				node_page_state(pgdat, NR_ACTIVE_FILE) +
  2816				node_page_state(pgdat, NR_INACTIVE_FILE) +
  2817				node_page_state(pgdat, NR_ISOLATED_FILE);
  2818			dirty = node_page_state(pgdat, NR_FILE_DIRTY);
> 2819			if (reclaimable_file > dirty)
  2820				clean = (reclaimable_file - dirty) << (PAGE_SHIFT - 10);
  2821	
  2822			sc->clean_below_low = clean < sysctl_clean_low_kbytes;
  2823			sc->clean_below_min = clean < sysctl_clean_min_kbytes;
  2824		} else {
  2825			sc->clean_below_low = false;
  2826			sc->clean_below_min = false;
  2827		}
  2828	
  2829		shrink_node_memcgs(pgdat, sc);
  2830	
  2831		if (reclaim_state) {
  2832			sc->nr_reclaimed += reclaim_state->reclaimed_slab;
  2833			reclaim_state->reclaimed_slab = 0;
  2834		}
  2835	
  2836		/* Record the subtree's reclaim efficiency */
  2837		vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
  2838			   sc->nr_scanned - nr_scanned,
  2839			   sc->nr_reclaimed - nr_reclaimed);
  2840	
  2841		if (sc->nr_reclaimed - nr_reclaimed)
  2842			reclaimable = true;
  2843	
  2844		if (current_is_kswapd()) {
  2845			/*
  2846			 * If reclaim is isolating dirty pages under writeback,
  2847			 * it implies that the long-lived page allocation rate
  2848			 * is exceeding the page laundering rate. Either the
  2849			 * global limits are not being effective at throttling
  2850			 * processes due to the page distribution throughout
  2851			 * zones or there is heavy usage of a slow backing
  2852			 * device. The only option is to throttle from reclaim
  2853			 * context which is not ideal as there is no guarantee
  2854			 * the dirtying process is throttled in the same way
  2855			 * balance_dirty_pages() manages.
  2856			 *
  2857			 * Once a node is flagged PGDAT_WRITEBACK, kswapd will
  2858			 * count the number of pages under pages flagged for
  2859			 * immediate reclaim and stall if any are encountered
  2860			 * in the nr_immediate check below.
  2861			 */
  2862			if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken)
  2863				set_bit(PGDAT_WRITEBACK, &pgdat->flags);
  2864	
  2865			/* Allow kswapd to start writing pages during reclaim.*/
  2866			if (sc->nr.unqueued_dirty == sc->nr.file_taken)
  2867				set_bit(PGDAT_DIRTY, &pgdat->flags);
  2868	
  2869			/*
  2870			 * If kswapd scans pages marked for immediate
  2871			 * reclaim and under writeback (nr_immediate), it
  2872			 * implies that pages are cycling through the LRU
  2873			 * faster than they are written so also forcibly stall.
  2874			 */
  2875			if (sc->nr.immediate)
  2876				congestion_wait(BLK_RW_ASYNC, HZ/10);
  2877		}
  2878	
  2879		/*
  2880		 * Tag a node/memcg as congested if all the dirty pages
  2881		 * scanned were backed by a congested BDI and
  2882		 * wait_iff_congested will stall.
  2883		 *
  2884		 * Legacy memcg will stall in page writeback so avoid forcibly
  2885		 * stalling in wait_iff_congested().
  2886		 */
  2887		if ((current_is_kswapd() ||
  2888		     (cgroup_reclaim(sc) && writeback_throttling_sane(sc))) &&
  2889		    sc->nr.dirty && sc->nr.dirty == sc->nr.congested)
  2890			set_bit(LRUVEC_CONGESTED, &target_lruvec->flags);
  2891	
  2892		/*
  2893		 * Stall direct reclaim for IO completions if underlying BDIs
  2894		 * and node is congested. Allow kswapd to continue until it
  2895		 * starts encountering unqueued dirty pages or cycling through
  2896		 * the LRU too quickly.
  2897		 */
  2898		if (!current_is_kswapd() && current_may_throttle() &&
  2899		    !sc->hibernation_mode &&
  2900		    test_bit(LRUVEC_CONGESTED, &target_lruvec->flags))
  2901			wait_iff_congested(BLK_RW_ASYNC, HZ/10);
  2902	
  2903		if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
  2904					    sc))
  2905			goto again;
  2906	
  2907		/*
  2908		 * Kswapd gives up on balancing particular nodes after too
  2909		 * many failures to reclaim anything from them and goes to
  2910		 * sleep. On reclaim progress, reset the failure counter. A
  2911		 * successful direct reclaim run will revive a dormant kswapd.
  2912		 */
  2913		if (reclaimable)
  2914			pgdat->kswapd_failures = 0;
  2915	}
  2916	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 11322 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-04-06  1:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com>
2021-04-03  0:44 ` [RFC PATCH] mm/swap: fix system stuck due to infinite loop Andrew Morton
2021-04-05 21:59 ` Alexey Avramov
2021-04-06  0:15   ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot
2021-04-06  1:16   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).