* Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop
[not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com>
@ 2021-04-03 0:44 ` Andrew Morton
2021-04-05 21:59 ` Alexey Avramov
1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2021-04-03 0:44 UTC (permalink / raw)
To: Stillinux
Cc: linux-mm, linux-kernel, liuzhengyuan, liuyun01, Johannes Weiner,
Hugh Dickins
On Fri, 2 Apr 2021 15:03:37 +0800 Stillinux <stillinux@gmail.com> wrote:
> In the case of high system memory and load pressure, we ran ltp test
> and found that the system was stuck, the direct memory reclaim was
> all stuck in io_schedule, the waiting request was stuck in the blk_plug
> flow of one process, and this process fell into an infinite loop.
> not do the action of brushing out the request.
>
> The call flow of this process is swap_cluster_readahead.
> Use blk_start/finish_plug for blk_plug operation,
> flow swap_cluster_readahead->__read_swap_cache_async->swapcache_prepare.
> When swapcache_prepare return -EEXIST, it will fall into an infinite loop,
> even if cond_resched is called, but according to the schedule,
> sched_submit_work will be based on tsk->state, and will not flash out
> the blk_plug request, so will hang io, causing the overall system hang.
>
> For the first time involving the swap part, there is no good way to fix
> the problem from the fundamental problem. In order to solve the
> engineering situation, we chose to make swap_cluster_readahead aware of
> the memory pressure situation as soon as possible, and do io_schedule to
> flush out the blk_plug request, thereby changing the allocation flag in
> swap_readpage to GFP_NOIO , No longer do the memory reclaim of flush io.
> Although system operating normally, but not the most fundamental way.
>
Thanks.
I'm not understanding why swapcache_prepare() repeatedly returns
-EEXIST in this situation?
And how does the switch to GFP_NOIO fix this? Simply by avoiding
direct reclaim altogether?
> ---
> mm/page_io.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index c493ce9ebcf5..87392ffabb12 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -403,7 +403,7 @@ int swap_readpage(struct page *page, bool synchronous)
> }
>
> ret = 0;
> - bio = bio_alloc(GFP_KERNEL, 1);
> + bio = bio_alloc(GFP_NOIO, 1);
> bio_set_dev(bio, sis->bdev);
> bio->bi_opf = REQ_OP_READ;
> bio->bi_iter.bi_sector = swap_page_sector(page);
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop
[not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com>
2021-04-03 0:44 ` [RFC PATCH] mm/swap: fix system stuck due to infinite loop Andrew Morton
@ 2021-04-05 21:59 ` Alexey Avramov
2021-04-06 0:15 ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot
2021-04-06 1:16 ` kernel test robot
1 sibling, 2 replies; 4+ messages in thread
From: Alexey Avramov @ 2021-04-05 21:59 UTC (permalink / raw)
To: Stillinux; +Cc: akpm, linux-mm, linux-kernel, liuzhengyuan, liuyun01
> In the case of high system memory and load pressure, we ran ltp test
> and found that the system was stuck, the direct memory reclaim was
> all stuck in io_schedule
> For the first time involving the swap part, there is no good way to fix
> the problem
The solution is protecting the clean file pages.
Look at this:
> On ChromiumOS, we do not use swap. When memory is low, the only
> way to free memory is to reclaim pages from the file list. This
> results in a lot of thrashing under low memory conditions. We see
> the system become unresponsive for minutes before it eventually OOMs.
> We also see very slow browser tab switching under low memory. Instead
> of an unresponsive system, we'd really like the kernel to OOM as soon
> as it starts to thrash. If it can't keep the working set in memory,
> then OOM. Losing one of many tabs is a better behaviour for the user
> than an unresponsive system.
> This patch create a new sysctl, min_filelist_kbytes, which disables
> reclaim of file-backed pages when when there are less than min_filelist_bytes
> worth of such pages in the cache. This tunable is handy for low memory
> systems using solid-state storage where interactive response is more important
> than not OOMing.
> With this patch and min_filelist_kbytes set to 50000, I see very little block
> layer activity during low memory. The system stays responsive under low
> memory and browser tab switching is fast. Eventually, a process a gets killed
> by OOM. Without this patch, the system gets wedged for minutes before it
> eventually OOMs.
— https://lore.kernel.org/patchwork/patch/222042/
This patch can almost completely eliminate thrashing under memory pressure.
Effects
- Improving system responsiveness under low-memory conditions;
- Improving performans in I/O bound tasks under memory pressure;
- OOM killer comes faster (with hard protection);
- Fast system reclaiming after OOM.
Read more: https://github.com/hakavlad/le9-patch
The patch:
From 371e3e5290652e97d5279d8cd215cd356c1fb47b Mon Sep 17 00:00:00 2001
From: Alexey Avramov <hakavlad@inbox.lv>
Date: Mon, 5 Apr 2021 01:53:26 +0900
Subject: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified
amount of clean file cache
The kernel does not have a mechanism for targeted protection of clean
file pages (CFP). A certain amount of the CFP is required by the userspace
for normal operation. First of all, you need a cache of shared libraries
and executable files. If the volume of the CFP cache falls below a certain
level, thrashing and even livelock occurs.
Protection of CFP may be used to prevent thrashing and reducing I/O under
memory pressure. Hard protection of CFP may be used to avoid high latency
and prevent livelock in near-OOM conditions. The patch provides sysctl
knobs for protecting the specified amount of clean file cache under memory
pressure.
The vm.clean_low_kbytes sysctl knob provides *best-effort* protection of
CFP. The CFP on the current node won't be reclaimed uder memory pressure
when their volume is below vm.clean_low_kbytes *unless* we threaten to OOM
or have no swap space or vm.swappiness=0. Setting it to a high value may
result in a early eviction of anonymous pages into the swap space by
attempting to hold the protected amount of clean file pages in memory. The
default value is defined by CONFIG_CLEAN_LOW_KBYTES (suggested 0 in
Kconfig).
The vm.clean_min_kbytes sysctl knob provides *hard* protection of CFP. The
CFP on the current node won't be reclaimed under memory pressure when their
volume is below vm.clean_min_kbytes. Setting it to a high value may result
in a early out-of-memory condition due to the inability to reclaim the
protected amount of CFP when other types of pages cannot be reclaimed. The
default value is defined by CONFIG_CLEAN_MIN_KBYTES (suggested 0 in
Kconfig).
Reported-by: Artem S. Tashkinov <aros@gmx.com>
Signed-off-by: Alexey Avramov <hakavlad@inbox.lv>
---
Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++++++++++
include/linux/mm.h | 3 ++
kernel/sysctl.c | 14 ++++++++
mm/Kconfig | 35 +++++++++++++++++++
mm/vmscan.c | 59 +++++++++++++++++++++++++++++++++
5 files changed, 148 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index f455fa00c..5d5ddfc85 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -26,6 +26,8 @@ Currently, these files are in /proc/sys/vm:
- admin_reserve_kbytes
- block_dump
+- clean_low_kbytes
+- clean_min_kbytes
- compact_memory
- compaction_proactiveness
- compact_unevictable_allowed
@@ -113,6 +115,41 @@ block_dump enables block I/O debugging when set to a nonzero value. More
information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.
+clean_low_kbytes
+=====================
+
+This knob provides *best-effort* protection of clean file pages. The clean file
+pages on the current node won't be reclaimed uder memory pressure when their
+volume is below vm.clean_low_kbytes *unless* we threaten to OOM or have no
+swap space or vm.swappiness=0.
+
+Protection of clean file pages may be used to prevent thrashing and
+reducing I/O under low-memory conditions.
+
+Setting it to a high value may result in a early eviction of anonymous pages
+into the swap space by attempting to hold the protected amount of clean file
+pages in memory.
+
+The default value is defined by CONFIG_CLEAN_LOW_KBYTES.
+
+
+clean_min_kbytes
+=====================
+
+This knob provides *hard* protection of clean file pages. The clean file pages
+on the current node won't be reclaimed under memory pressure when their volume
+is below vm.clean_min_kbytes.
+
+Hard protection of clean file pages may be used to avoid high latency and
+prevent livelock in near-OOM conditions.
+
+Setting it to a high value may result in a early out-of-memory condition due to
+the inability to reclaim the protected amount of clean file pages when other
+types of pages cannot be reclaimed.
+
+The default value is defined by CONFIG_CLEAN_MIN_KBYTES.
+
+
compact_memory
==============
diff --git a/include/linux/mm.h b/include/linux/mm.h
index db6ae4d3f..7799f1555 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -202,6 +202,9 @@ static inline void __mm_zero_struct_page(struct page *page)
extern int sysctl_max_map_count;
+extern unsigned long sysctl_clean_low_kbytes;
+extern unsigned long sysctl_clean_min_kbytes;
+
extern unsigned long sysctl_user_reserve_kbytes;
extern unsigned long sysctl_admin_reserve_kbytes;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index afad08596..854b311cd 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3083,6 +3083,20 @@ static struct ctl_table vm_table[] = {
},
#endif
{
+ .procname = "clean_low_kbytes",
+ .data = &sysctl_clean_low_kbytes,
+ .maxlen = sizeof(sysctl_clean_low_kbytes),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
+ .procname = "clean_min_kbytes",
+ .data = &sysctl_clean_min_kbytes,
+ .maxlen = sizeof(sysctl_clean_min_kbytes),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_minmax,
+ },
+ {
.procname = "user_reserve_kbytes",
.data = &sysctl_user_reserve_kbytes,
.maxlen = sizeof(sysctl_user_reserve_kbytes),
diff --git a/mm/Kconfig b/mm/Kconfig
index 390165ffb..3915c71e1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -122,6 +122,41 @@ config SPARSEMEM_VMEMMAP
pfn_to_page and page_to_pfn operations. This is the most
efficient option when sufficient kernel resources are available.
+config CLEAN_LOW_KBYTES
+ int "Default value for vm.clean_low_kbytes"
+ depends on SYSCTL
+ default "0"
+ help
+ The vm.clean_file_low_kbytes sysctl knob provides *best-effort*
+ protection of clean file pages. The clean file pages on the current
+ node won't be reclaimed uder memory pressure when their volume is
+ below vm.clean_low_kbytes *unless* we threaten to OOM or have
+ no swap space or vm.swappiness=0.
+
+ Protection of clean file pages may be used to prevent thrashing and
+ reducing I/O under low-memory conditions.
+
+ Setting it to a high value may result in a early eviction of anonymous
+ pages into the swap space by attempting to hold the protected amount of
+ clean file pages in memory.
+
+config CLEAN_MIN_KBYTES
+ int "Default value for vm.clean_min_kbytes"
+ depends on SYSCTL
+ default "0"
+ help
+ The vm.clean_file_min_kbytes sysctl knob provides *hard* protection
+ of clean file pages. The clean file pages on the current node won't be
+ reclaimed under memory pressure when their volume is below
+ vm.clean_min_kbytes.
+
+ Hard protection of clean file pages may be used to avoid high latency and
+ prevent livelock in near-OOM conditions.
+
+ Setting it to a high value may result in a early out-of-memory condition
+ due to the inability to reclaim the protected amount of clean file pages
+ when other types of pages cannot be reclaimed.
+
config HAVE_MEMBLOCK_PHYS_MAP
bool
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7b4e31eac..77e98c43e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -120,6 +120,19 @@ struct scan_control {
/* The file pages on the current node are dangerously low */
unsigned int file_is_tiny:1;
+ /*
+ * The clean file pages on the current node won't be reclaimed when
+ * their volume is below vm.clean_low_kbytes *unless* we threaten
+ * to OOM or have no swap space or vm.swappiness=0.
+ */
+ unsigned int clean_below_low:1;
+
+ /*
+ * The clean file pages on the current node won't be reclaimed when
+ * their volume is below vm.clean_min_kbytes.
+ */
+ unsigned int clean_below_min:1;
+
/* Allocation order */
s8 order;
@@ -166,6 +179,17 @@ struct scan_control {
#define prefetchw_prev_lru_page(_page, _base, _field) do { } while (0)
#endif
+#if CONFIG_CLEAN_LOW_KBYTES < 0
+#error "CONFIG_CLEAN_LOW_KBYTES must be >= 0"
+#endif
+
+#if CONFIG_CLEAN_MIN_KBYTES < 0
+#error "CONFIG_CLEAN_MIN_KBYTES must be >= 0"
+#endif
+
+unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES;
+unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES;
+
/*
* From 0 .. 200. Higher means more swappy.
*/
@@ -2283,6 +2307,16 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
}
/*
+ * Force-scan anon if clean file pages is under vm.clean_min_kbytes
+ * or vm.clean_low_kbytes (unless the swappiness setting
+ * disagrees with swapping).
+ */
+ if ((sc->clean_below_low || sc->clean_below_min) && swappiness) {
+ scan_balance = SCAN_ANON;
+ goto out;
+ }
+
+ /*
* If there is enough inactive page cache, we do not reclaim
* anything from the anonymous working right now.
*/
@@ -2418,6 +2452,13 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
BUG();
}
+ /*
+ * Don't reclaim clean file pages when their volume is below
+ * vm.clean_min_kbytes.
+ */
+ if (file && sc->clean_below_min)
+ scan = 0;
+
nr[lru] = scan;
}
}
@@ -2768,6 +2809,24 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
anon >> sc->priority;
}
+ if (sysctl_clean_low_kbytes || sysctl_clean_min_kbytes) {
+ unsigned long reclaimable_file, dirty, clean;
+
+ reclaimable_file =
+ node_page_state(pgdat, NR_ACTIVE_FILE) +
+ node_page_state(pgdat, NR_INACTIVE_FILE) +
+ node_page_state(pgdat, NR_ISOLATED_FILE);
+ dirty = node_page_state(pgdat, NR_FILE_DIRTY);
+ if (reclaimable_file > dirty)
+ clean = (reclaimable_file - dirty) << (PAGE_SHIFT - 10);
+
+ sc->clean_below_low = clean < sysctl_clean_low_kbytes;
+ sc->clean_below_min = clean < sysctl_clean_min_kbytes;
+ } else {
+ sc->clean_below_low = false;
+ sc->clean_below_min = false;
+ }
+
shrink_node_memcgs(pgdat, sc);
if (reclaim_state) {
--
2.11.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified
2021-04-05 21:59 ` Alexey Avramov
@ 2021-04-06 0:15 ` kernel test robot
2021-04-06 1:16 ` kernel test robot
1 sibling, 0 replies; 4+ messages in thread
From: kernel test robot @ 2021-04-06 0:15 UTC (permalink / raw)
To: Alexey Avramov, Stillinux
Cc: kbuild-all, akpm, linux-mm, linux-kernel, liuzhengyuan, liuyun01
[-- Attachment #1: Type: text/plain, Size: 3048 bytes --]
Hi Alexey,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linux/master]
[also build test ERROR on linus/master v5.12-rc6 next-20210401]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486
config: parisc-randconfig-m031-20210405 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/a5eeb8d197a8e10c333422e9cc0f2c7d976a3426
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034
git checkout a5eeb8d197a8e10c333422e9cc0f2c7d976a3426
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All error/warnings (new ones prefixed by >>):
>> mm/vmscan.c:180:5: warning: "CONFIG_CLEAN_LOW_KBYTES" is not defined, evaluates to 0 [-Wundef]
180 | #if CONFIG_CLEAN_LOW_KBYTES < 0
| ^~~~~~~~~~~~~~~~~~~~~~~
>> mm/vmscan.c:184:5: warning: "CONFIG_CLEAN_MIN_KBYTES" is not defined, evaluates to 0 [-Wundef]
184 | #if CONFIG_CLEAN_MIN_KBYTES < 0
| ^~~~~~~~~~~~~~~~~~~~~~~
>> mm/vmscan.c:188:55: error: 'CONFIG_CLEAN_LOW_KBYTES' undeclared here (not in a function)
188 | unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES;
| ^~~~~~~~~~~~~~~~~~~~~~~
>> mm/vmscan.c:189:55: error: 'CONFIG_CLEAN_MIN_KBYTES' undeclared here (not in a function)
189 | unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES;
| ^~~~~~~~~~~~~~~~~~~~~~~
vim +/CONFIG_CLEAN_LOW_KBYTES +188 mm/vmscan.c
179
> 180 #if CONFIG_CLEAN_LOW_KBYTES < 0
181 #error "CONFIG_CLEAN_LOW_KBYTES must be >= 0"
182 #endif
183
> 184 #if CONFIG_CLEAN_MIN_KBYTES < 0
185 #error "CONFIG_CLEAN_MIN_KBYTES must be >= 0"
186 #endif
187
> 188 unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES;
> 189 unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES;
190
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26587 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified
2021-04-05 21:59 ` Alexey Avramov
2021-04-06 0:15 ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot
@ 2021-04-06 1:16 ` kernel test robot
1 sibling, 0 replies; 4+ messages in thread
From: kernel test robot @ 2021-04-06 1:16 UTC (permalink / raw)
To: Alexey Avramov, Stillinux
Cc: kbuild-all, clang-built-linux, akpm, linux-mm, linux-kernel,
liuzhengyuan, liuyun01
[-- Attachment #1: Type: text/plain, Size: 16131 bytes --]
Hi Alexey,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.12-rc6 next-20210401]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486
config: s390-randconfig-r006-20210405 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project a46f59a747a7273cc439efaf3b4f98d8b63d2f20)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install s390 cross compiling tool for clang build
# apt-get install binutils-s390x-linux-gnu
# https://github.com/0day-ci/linux/commit/a5eeb8d197a8e10c333422e9cc0f2c7d976a3426
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034
git checkout a5eeb8d197a8e10c333422e9cc0f2c7d976a3426
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from mm/vmscan.c:20:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:22:
In file included from include/linux/writeback.h:14:
In file included from include/linux/blk-cgroup.h:23:
In file included from include/linux/blkdev.h:26:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
#define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
^
In file included from mm/vmscan.c:20:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:22:
In file included from include/linux/writeback.h:14:
In file included from include/linux/blk-cgroup.h:23:
In file included from include/linux/blkdev.h:26:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
#define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
^
In file included from mm/vmscan.c:20:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:22:
In file included from include/linux/writeback.h:14:
In file included from include/linux/blk-cgroup.h:23:
In file included from include/linux/blkdev.h:26:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:80:
include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
>> mm/vmscan.c:2819:7: warning: variable 'clean' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (reclaimable_file > dirty)
^~~~~~~~~~~~~~~~~~~~~~~~
mm/vmscan.c:2822:25: note: uninitialized use occurs here
sc->clean_below_low = clean < sysctl_clean_low_kbytes;
^~~~~
mm/vmscan.c:2819:3: note: remove the 'if' if its condition is always true
if (reclaimable_file > dirty)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/vmscan.c:2812:47: note: initialize the variable 'clean' to silence this warning
unsigned long reclaimable_file, dirty, clean;
^
= 0
13 warnings generated.
vim +2819 mm/vmscan.c
2706
2707 static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
2708 {
2709 struct reclaim_state *reclaim_state = current->reclaim_state;
2710 unsigned long nr_reclaimed, nr_scanned;
2711 struct lruvec *target_lruvec;
2712 bool reclaimable = false;
2713 unsigned long file;
2714
2715 target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
2716
2717 again:
2718 memset(&sc->nr, 0, sizeof(sc->nr));
2719
2720 nr_reclaimed = sc->nr_reclaimed;
2721 nr_scanned = sc->nr_scanned;
2722
2723 /*
2724 * Determine the scan balance between anon and file LRUs.
2725 */
2726 spin_lock_irq(&target_lruvec->lru_lock);
2727 sc->anon_cost = target_lruvec->anon_cost;
2728 sc->file_cost = target_lruvec->file_cost;
2729 spin_unlock_irq(&target_lruvec->lru_lock);
2730
2731 /*
2732 * Target desirable inactive:active list ratios for the anon
2733 * and file LRU lists.
2734 */
2735 if (!sc->force_deactivate) {
2736 unsigned long refaults;
2737
2738 refaults = lruvec_page_state(target_lruvec,
2739 WORKINGSET_ACTIVATE_ANON);
2740 if (refaults != target_lruvec->refaults[0] ||
2741 inactive_is_low(target_lruvec, LRU_INACTIVE_ANON))
2742 sc->may_deactivate |= DEACTIVATE_ANON;
2743 else
2744 sc->may_deactivate &= ~DEACTIVATE_ANON;
2745
2746 /*
2747 * When refaults are being observed, it means a new
2748 * workingset is being established. Deactivate to get
2749 * rid of any stale active pages quickly.
2750 */
2751 refaults = lruvec_page_state(target_lruvec,
2752 WORKINGSET_ACTIVATE_FILE);
2753 if (refaults != target_lruvec->refaults[1] ||
2754 inactive_is_low(target_lruvec, LRU_INACTIVE_FILE))
2755 sc->may_deactivate |= DEACTIVATE_FILE;
2756 else
2757 sc->may_deactivate &= ~DEACTIVATE_FILE;
2758 } else
2759 sc->may_deactivate = DEACTIVATE_ANON | DEACTIVATE_FILE;
2760
2761 /*
2762 * If we have plenty of inactive file pages that aren't
2763 * thrashing, try to reclaim those first before touching
2764 * anonymous pages.
2765 */
2766 file = lruvec_page_state(target_lruvec, NR_INACTIVE_FILE);
2767 if (file >> sc->priority && !(sc->may_deactivate & DEACTIVATE_FILE))
2768 sc->cache_trim_mode = 1;
2769 else
2770 sc->cache_trim_mode = 0;
2771
2772 /*
2773 * Prevent the reclaimer from falling into the cache trap: as
2774 * cache pages start out inactive, every cache fault will tip
2775 * the scan balance towards the file LRU. And as the file LRU
2776 * shrinks, so does the window for rotation from references.
2777 * This means we have a runaway feedback loop where a tiny
2778 * thrashing file LRU becomes infinitely more attractive than
2779 * anon pages. Try to detect this based on file LRU size.
2780 */
2781 if (!cgroup_reclaim(sc)) {
2782 unsigned long total_high_wmark = 0;
2783 unsigned long free, anon;
2784 int z;
2785
2786 free = sum_zone_node_page_state(pgdat->node_id, NR_FREE_PAGES);
2787 file = node_page_state(pgdat, NR_ACTIVE_FILE) +
2788 node_page_state(pgdat, NR_INACTIVE_FILE);
2789
2790 for (z = 0; z < MAX_NR_ZONES; z++) {
2791 struct zone *zone = &pgdat->node_zones[z];
2792 if (!managed_zone(zone))
2793 continue;
2794
2795 total_high_wmark += high_wmark_pages(zone);
2796 }
2797
2798 /*
2799 * Consider anon: if that's low too, this isn't a
2800 * runaway file reclaim problem, but rather just
2801 * extreme pressure. Reclaim as per usual then.
2802 */
2803 anon = node_page_state(pgdat, NR_INACTIVE_ANON);
2804
2805 sc->file_is_tiny =
2806 file + free <= total_high_wmark &&
2807 !(sc->may_deactivate & DEACTIVATE_ANON) &&
2808 anon >> sc->priority;
2809 }
2810
2811 if (sysctl_clean_low_kbytes || sysctl_clean_min_kbytes) {
2812 unsigned long reclaimable_file, dirty, clean;
2813
2814 reclaimable_file =
2815 node_page_state(pgdat, NR_ACTIVE_FILE) +
2816 node_page_state(pgdat, NR_INACTIVE_FILE) +
2817 node_page_state(pgdat, NR_ISOLATED_FILE);
2818 dirty = node_page_state(pgdat, NR_FILE_DIRTY);
> 2819 if (reclaimable_file > dirty)
2820 clean = (reclaimable_file - dirty) << (PAGE_SHIFT - 10);
2821
2822 sc->clean_below_low = clean < sysctl_clean_low_kbytes;
2823 sc->clean_below_min = clean < sysctl_clean_min_kbytes;
2824 } else {
2825 sc->clean_below_low = false;
2826 sc->clean_below_min = false;
2827 }
2828
2829 shrink_node_memcgs(pgdat, sc);
2830
2831 if (reclaim_state) {
2832 sc->nr_reclaimed += reclaim_state->reclaimed_slab;
2833 reclaim_state->reclaimed_slab = 0;
2834 }
2835
2836 /* Record the subtree's reclaim efficiency */
2837 vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
2838 sc->nr_scanned - nr_scanned,
2839 sc->nr_reclaimed - nr_reclaimed);
2840
2841 if (sc->nr_reclaimed - nr_reclaimed)
2842 reclaimable = true;
2843
2844 if (current_is_kswapd()) {
2845 /*
2846 * If reclaim is isolating dirty pages under writeback,
2847 * it implies that the long-lived page allocation rate
2848 * is exceeding the page laundering rate. Either the
2849 * global limits are not being effective at throttling
2850 * processes due to the page distribution throughout
2851 * zones or there is heavy usage of a slow backing
2852 * device. The only option is to throttle from reclaim
2853 * context which is not ideal as there is no guarantee
2854 * the dirtying process is throttled in the same way
2855 * balance_dirty_pages() manages.
2856 *
2857 * Once a node is flagged PGDAT_WRITEBACK, kswapd will
2858 * count the number of pages under pages flagged for
2859 * immediate reclaim and stall if any are encountered
2860 * in the nr_immediate check below.
2861 */
2862 if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken)
2863 set_bit(PGDAT_WRITEBACK, &pgdat->flags);
2864
2865 /* Allow kswapd to start writing pages during reclaim.*/
2866 if (sc->nr.unqueued_dirty == sc->nr.file_taken)
2867 set_bit(PGDAT_DIRTY, &pgdat->flags);
2868
2869 /*
2870 * If kswapd scans pages marked for immediate
2871 * reclaim and under writeback (nr_immediate), it
2872 * implies that pages are cycling through the LRU
2873 * faster than they are written so also forcibly stall.
2874 */
2875 if (sc->nr.immediate)
2876 congestion_wait(BLK_RW_ASYNC, HZ/10);
2877 }
2878
2879 /*
2880 * Tag a node/memcg as congested if all the dirty pages
2881 * scanned were backed by a congested BDI and
2882 * wait_iff_congested will stall.
2883 *
2884 * Legacy memcg will stall in page writeback so avoid forcibly
2885 * stalling in wait_iff_congested().
2886 */
2887 if ((current_is_kswapd() ||
2888 (cgroup_reclaim(sc) && writeback_throttling_sane(sc))) &&
2889 sc->nr.dirty && sc->nr.dirty == sc->nr.congested)
2890 set_bit(LRUVEC_CONGESTED, &target_lruvec->flags);
2891
2892 /*
2893 * Stall direct reclaim for IO completions if underlying BDIs
2894 * and node is congested. Allow kswapd to continue until it
2895 * starts encountering unqueued dirty pages or cycling through
2896 * the LRU too quickly.
2897 */
2898 if (!current_is_kswapd() && current_may_throttle() &&
2899 !sc->hibernation_mode &&
2900 test_bit(LRUVEC_CONGESTED, &target_lruvec->flags))
2901 wait_iff_congested(BLK_RW_ASYNC, HZ/10);
2902
2903 if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
2904 sc))
2905 goto again;
2906
2907 /*
2908 * Kswapd gives up on balancing particular nodes after too
2909 * many failures to reclaim anything from them and goes to
2910 * sleep. On reclaim progress, reset the failure counter. A
2911 * successful direct reclaim run will revive a dormant kswapd.
2912 */
2913 if (reclaimable)
2914 pgdat->kswapd_failures = 0;
2915 }
2916
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 11322 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-04-06 1:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com>
2021-04-03 0:44 ` [RFC PATCH] mm/swap: fix system stuck due to infinite loop Andrew Morton
2021-04-05 21:59 ` Alexey Avramov
2021-04-06 0:15 ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot
2021-04-06 1:16 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).