* Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop [not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com> @ 2021-04-03 0:44 ` Andrew Morton 2021-04-05 21:59 ` Alexey Avramov 1 sibling, 0 replies; 4+ messages in thread From: Andrew Morton @ 2021-04-03 0:44 UTC (permalink / raw) To: Stillinux Cc: linux-mm, linux-kernel, liuzhengyuan, liuyun01, Johannes Weiner, Hugh Dickins On Fri, 2 Apr 2021 15:03:37 +0800 Stillinux <stillinux@gmail.com> wrote: > In the case of high system memory and load pressure, we ran ltp test > and found that the system was stuck, the direct memory reclaim was > all stuck in io_schedule, the waiting request was stuck in the blk_plug > flow of one process, and this process fell into an infinite loop. > not do the action of brushing out the request. > > The call flow of this process is swap_cluster_readahead. > Use blk_start/finish_plug for blk_plug operation, > flow swap_cluster_readahead->__read_swap_cache_async->swapcache_prepare. > When swapcache_prepare return -EEXIST, it will fall into an infinite loop, > even if cond_resched is called, but according to the schedule, > sched_submit_work will be based on tsk->state, and will not flash out > the blk_plug request, so will hang io, causing the overall system hang. > > For the first time involving the swap part, there is no good way to fix > the problem from the fundamental problem. In order to solve the > engineering situation, we chose to make swap_cluster_readahead aware of > the memory pressure situation as soon as possible, and do io_schedule to > flush out the blk_plug request, thereby changing the allocation flag in > swap_readpage to GFP_NOIO , No longer do the memory reclaim of flush io. > Although system operating normally, but not the most fundamental way. > Thanks. I'm not understanding why swapcache_prepare() repeatedly returns -EEXIST in this situation? And how does the switch to GFP_NOIO fix this? Simply by avoiding direct reclaim altogether? > --- > mm/page_io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_io.c b/mm/page_io.c > index c493ce9ebcf5..87392ffabb12 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -403,7 +403,7 @@ int swap_readpage(struct page *page, bool synchronous) > } > > ret = 0; > - bio = bio_alloc(GFP_KERNEL, 1); > + bio = bio_alloc(GFP_NOIO, 1); > bio_set_dev(bio, sis->bdev); > bio->bi_opf = REQ_OP_READ; > bio->bi_iter.bi_sector = swap_page_sector(page); ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] mm/swap: fix system stuck due to infinite loop [not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com> 2021-04-03 0:44 ` [RFC PATCH] mm/swap: fix system stuck due to infinite loop Andrew Morton @ 2021-04-05 21:59 ` Alexey Avramov 2021-04-06 0:15 ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot 2021-04-06 1:16 ` kernel test robot 1 sibling, 2 replies; 4+ messages in thread From: Alexey Avramov @ 2021-04-05 21:59 UTC (permalink / raw) To: Stillinux; +Cc: akpm, linux-mm, linux-kernel, liuzhengyuan, liuyun01 > In the case of high system memory and load pressure, we ran ltp test > and found that the system was stuck, the direct memory reclaim was > all stuck in io_schedule > For the first time involving the swap part, there is no good way to fix > the problem The solution is protecting the clean file pages. Look at this: > On ChromiumOS, we do not use swap. When memory is low, the only > way to free memory is to reclaim pages from the file list. This > results in a lot of thrashing under low memory conditions. We see > the system become unresponsive for minutes before it eventually OOMs. > We also see very slow browser tab switching under low memory. Instead > of an unresponsive system, we'd really like the kernel to OOM as soon > as it starts to thrash. If it can't keep the working set in memory, > then OOM. Losing one of many tabs is a better behaviour for the user > than an unresponsive system. > This patch create a new sysctl, min_filelist_kbytes, which disables > reclaim of file-backed pages when when there are less than min_filelist_bytes > worth of such pages in the cache. This tunable is handy for low memory > systems using solid-state storage where interactive response is more important > than not OOMing. > With this patch and min_filelist_kbytes set to 50000, I see very little block > layer activity during low memory. The system stays responsive under low > memory and browser tab switching is fast. Eventually, a process a gets killed > by OOM. Without this patch, the system gets wedged for minutes before it > eventually OOMs. — https://lore.kernel.org/patchwork/patch/222042/ This patch can almost completely eliminate thrashing under memory pressure. Effects - Improving system responsiveness under low-memory conditions; - Improving performans in I/O bound tasks under memory pressure; - OOM killer comes faster (with hard protection); - Fast system reclaiming after OOM. Read more: https://github.com/hakavlad/le9-patch The patch: From 371e3e5290652e97d5279d8cd215cd356c1fb47b Mon Sep 17 00:00:00 2001 From: Alexey Avramov <hakavlad@inbox.lv> Date: Mon, 5 Apr 2021 01:53:26 +0900 Subject: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified amount of clean file cache The kernel does not have a mechanism for targeted protection of clean file pages (CFP). A certain amount of the CFP is required by the userspace for normal operation. First of all, you need a cache of shared libraries and executable files. If the volume of the CFP cache falls below a certain level, thrashing and even livelock occurs. Protection of CFP may be used to prevent thrashing and reducing I/O under memory pressure. Hard protection of CFP may be used to avoid high latency and prevent livelock in near-OOM conditions. The patch provides sysctl knobs for protecting the specified amount of clean file cache under memory pressure. The vm.clean_low_kbytes sysctl knob provides *best-effort* protection of CFP. The CFP on the current node won't be reclaimed uder memory pressure when their volume is below vm.clean_low_kbytes *unless* we threaten to OOM or have no swap space or vm.swappiness=0. Setting it to a high value may result in a early eviction of anonymous pages into the swap space by attempting to hold the protected amount of clean file pages in memory. The default value is defined by CONFIG_CLEAN_LOW_KBYTES (suggested 0 in Kconfig). The vm.clean_min_kbytes sysctl knob provides *hard* protection of CFP. The CFP on the current node won't be reclaimed under memory pressure when their volume is below vm.clean_min_kbytes. Setting it to a high value may result in a early out-of-memory condition due to the inability to reclaim the protected amount of CFP when other types of pages cannot be reclaimed. The default value is defined by CONFIG_CLEAN_MIN_KBYTES (suggested 0 in Kconfig). Reported-by: Artem S. Tashkinov <aros@gmx.com> Signed-off-by: Alexey Avramov <hakavlad@inbox.lv> --- Documentation/admin-guide/sysctl/vm.rst | 37 +++++++++++++++++++++ include/linux/mm.h | 3 ++ kernel/sysctl.c | 14 ++++++++ mm/Kconfig | 35 +++++++++++++++++++ mm/vmscan.c | 59 +++++++++++++++++++++++++++++++++ 5 files changed, 148 insertions(+) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f455fa00c..5d5ddfc85 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -26,6 +26,8 @@ Currently, these files are in /proc/sys/vm: - admin_reserve_kbytes - block_dump +- clean_low_kbytes +- clean_min_kbytes - compact_memory - compaction_proactiveness - compact_unevictable_allowed @@ -113,6 +115,41 @@ block_dump enables block I/O debugging when set to a nonzero value. More information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst. +clean_low_kbytes +===================== + +This knob provides *best-effort* protection of clean file pages. The clean file +pages on the current node won't be reclaimed uder memory pressure when their +volume is below vm.clean_low_kbytes *unless* we threaten to OOM or have no +swap space or vm.swappiness=0. + +Protection of clean file pages may be used to prevent thrashing and +reducing I/O under low-memory conditions. + +Setting it to a high value may result in a early eviction of anonymous pages +into the swap space by attempting to hold the protected amount of clean file +pages in memory. + +The default value is defined by CONFIG_CLEAN_LOW_KBYTES. + + +clean_min_kbytes +===================== + +This knob provides *hard* protection of clean file pages. The clean file pages +on the current node won't be reclaimed under memory pressure when their volume +is below vm.clean_min_kbytes. + +Hard protection of clean file pages may be used to avoid high latency and +prevent livelock in near-OOM conditions. + +Setting it to a high value may result in a early out-of-memory condition due to +the inability to reclaim the protected amount of clean file pages when other +types of pages cannot be reclaimed. + +The default value is defined by CONFIG_CLEAN_MIN_KBYTES. + + compact_memory ============== diff --git a/include/linux/mm.h b/include/linux/mm.h index db6ae4d3f..7799f1555 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -202,6 +202,9 @@ static inline void __mm_zero_struct_page(struct page *page) extern int sysctl_max_map_count; +extern unsigned long sysctl_clean_low_kbytes; +extern unsigned long sysctl_clean_min_kbytes; + extern unsigned long sysctl_user_reserve_kbytes; extern unsigned long sysctl_admin_reserve_kbytes; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index afad08596..854b311cd 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -3083,6 +3083,20 @@ static struct ctl_table vm_table[] = { }, #endif { + .procname = "clean_low_kbytes", + .data = &sysctl_clean_low_kbytes, + .maxlen = sizeof(sysctl_clean_low_kbytes), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, + { + .procname = "clean_min_kbytes", + .data = &sysctl_clean_min_kbytes, + .maxlen = sizeof(sysctl_clean_min_kbytes), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, + { .procname = "user_reserve_kbytes", .data = &sysctl_user_reserve_kbytes, .maxlen = sizeof(sysctl_user_reserve_kbytes), diff --git a/mm/Kconfig b/mm/Kconfig index 390165ffb..3915c71e1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -122,6 +122,41 @@ config SPARSEMEM_VMEMMAP pfn_to_page and page_to_pfn operations. This is the most efficient option when sufficient kernel resources are available. +config CLEAN_LOW_KBYTES + int "Default value for vm.clean_low_kbytes" + depends on SYSCTL + default "0" + help + The vm.clean_file_low_kbytes sysctl knob provides *best-effort* + protection of clean file pages. The clean file pages on the current + node won't be reclaimed uder memory pressure when their volume is + below vm.clean_low_kbytes *unless* we threaten to OOM or have + no swap space or vm.swappiness=0. + + Protection of clean file pages may be used to prevent thrashing and + reducing I/O under low-memory conditions. + + Setting it to a high value may result in a early eviction of anonymous + pages into the swap space by attempting to hold the protected amount of + clean file pages in memory. + +config CLEAN_MIN_KBYTES + int "Default value for vm.clean_min_kbytes" + depends on SYSCTL + default "0" + help + The vm.clean_file_min_kbytes sysctl knob provides *hard* protection + of clean file pages. The clean file pages on the current node won't be + reclaimed under memory pressure when their volume is below + vm.clean_min_kbytes. + + Hard protection of clean file pages may be used to avoid high latency and + prevent livelock in near-OOM conditions. + + Setting it to a high value may result in a early out-of-memory condition + due to the inability to reclaim the protected amount of clean file pages + when other types of pages cannot be reclaimed. + config HAVE_MEMBLOCK_PHYS_MAP bool diff --git a/mm/vmscan.c b/mm/vmscan.c index 7b4e31eac..77e98c43e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -120,6 +120,19 @@ struct scan_control { /* The file pages on the current node are dangerously low */ unsigned int file_is_tiny:1; + /* + * The clean file pages on the current node won't be reclaimed when + * their volume is below vm.clean_low_kbytes *unless* we threaten + * to OOM or have no swap space or vm.swappiness=0. + */ + unsigned int clean_below_low:1; + + /* + * The clean file pages on the current node won't be reclaimed when + * their volume is below vm.clean_min_kbytes. + */ + unsigned int clean_below_min:1; + /* Allocation order */ s8 order; @@ -166,6 +179,17 @@ struct scan_control { #define prefetchw_prev_lru_page(_page, _base, _field) do { } while (0) #endif +#if CONFIG_CLEAN_LOW_KBYTES < 0 +#error "CONFIG_CLEAN_LOW_KBYTES must be >= 0" +#endif + +#if CONFIG_CLEAN_MIN_KBYTES < 0 +#error "CONFIG_CLEAN_MIN_KBYTES must be >= 0" +#endif + +unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES; +unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES; + /* * From 0 .. 200. Higher means more swappy. */ @@ -2283,6 +2307,16 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, } /* + * Force-scan anon if clean file pages is under vm.clean_min_kbytes + * or vm.clean_low_kbytes (unless the swappiness setting + * disagrees with swapping). + */ + if ((sc->clean_below_low || sc->clean_below_min) && swappiness) { + scan_balance = SCAN_ANON; + goto out; + } + + /* * If there is enough inactive page cache, we do not reclaim * anything from the anonymous working right now. */ @@ -2418,6 +2452,13 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, BUG(); } + /* + * Don't reclaim clean file pages when their volume is below + * vm.clean_min_kbytes. + */ + if (file && sc->clean_below_min) + scan = 0; + nr[lru] = scan; } } @@ -2768,6 +2809,24 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) anon >> sc->priority; } + if (sysctl_clean_low_kbytes || sysctl_clean_min_kbytes) { + unsigned long reclaimable_file, dirty, clean; + + reclaimable_file = + node_page_state(pgdat, NR_ACTIVE_FILE) + + node_page_state(pgdat, NR_INACTIVE_FILE) + + node_page_state(pgdat, NR_ISOLATED_FILE); + dirty = node_page_state(pgdat, NR_FILE_DIRTY); + if (reclaimable_file > dirty) + clean = (reclaimable_file - dirty) << (PAGE_SHIFT - 10); + + sc->clean_below_low = clean < sysctl_clean_low_kbytes; + sc->clean_below_min = clean < sysctl_clean_min_kbytes; + } else { + sc->clean_below_low = false; + sc->clean_below_min = false; + } + shrink_node_memcgs(pgdat, sc); if (reclaim_state) { -- 2.11.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified 2021-04-05 21:59 ` Alexey Avramov @ 2021-04-06 0:15 ` kernel test robot 2021-04-06 1:16 ` kernel test robot 1 sibling, 0 replies; 4+ messages in thread From: kernel test robot @ 2021-04-06 0:15 UTC (permalink / raw) To: Alexey Avramov, Stillinux Cc: kbuild-all, akpm, linux-mm, linux-kernel, liuzhengyuan, liuyun01 [-- Attachment #1: Type: text/plain, Size: 3048 bytes --] Hi Alexey, Thank you for the patch! Yet something to improve: [auto build test ERROR on linux/master] [also build test ERROR on linus/master v5.12-rc6 next-20210401] [cannot apply to hnaz-linux-mm/master] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486 config: parisc-randconfig-m031-20210405 (attached as .config) compiler: hppa-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/a5eeb8d197a8e10c333422e9cc0f2c7d976a3426 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034 git checkout a5eeb8d197a8e10c333422e9cc0f2c7d976a3426 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All error/warnings (new ones prefixed by >>): >> mm/vmscan.c:180:5: warning: "CONFIG_CLEAN_LOW_KBYTES" is not defined, evaluates to 0 [-Wundef] 180 | #if CONFIG_CLEAN_LOW_KBYTES < 0 | ^~~~~~~~~~~~~~~~~~~~~~~ >> mm/vmscan.c:184:5: warning: "CONFIG_CLEAN_MIN_KBYTES" is not defined, evaluates to 0 [-Wundef] 184 | #if CONFIG_CLEAN_MIN_KBYTES < 0 | ^~~~~~~~~~~~~~~~~~~~~~~ >> mm/vmscan.c:188:55: error: 'CONFIG_CLEAN_LOW_KBYTES' undeclared here (not in a function) 188 | unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES; | ^~~~~~~~~~~~~~~~~~~~~~~ >> mm/vmscan.c:189:55: error: 'CONFIG_CLEAN_MIN_KBYTES' undeclared here (not in a function) 189 | unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES; | ^~~~~~~~~~~~~~~~~~~~~~~ vim +/CONFIG_CLEAN_LOW_KBYTES +188 mm/vmscan.c 179 > 180 #if CONFIG_CLEAN_LOW_KBYTES < 0 181 #error "CONFIG_CLEAN_LOW_KBYTES must be >= 0" 182 #endif 183 > 184 #if CONFIG_CLEAN_MIN_KBYTES < 0 185 #error "CONFIG_CLEAN_MIN_KBYTES must be >= 0" 186 #endif 187 > 188 unsigned long sysctl_clean_low_kbytes __read_mostly = CONFIG_CLEAN_LOW_KBYTES; > 189 unsigned long sysctl_clean_min_kbytes __read_mostly = CONFIG_CLEAN_MIN_KBYTES; 190 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 26587 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the specified 2021-04-05 21:59 ` Alexey Avramov 2021-04-06 0:15 ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot @ 2021-04-06 1:16 ` kernel test robot 1 sibling, 0 replies; 4+ messages in thread From: kernel test robot @ 2021-04-06 1:16 UTC (permalink / raw) To: Alexey Avramov, Stillinux Cc: kbuild-all, clang-built-linux, akpm, linux-mm, linux-kernel, liuzhengyuan, liuyun01 [-- Attachment #1: Type: text/plain, Size: 16131 bytes --] Hi Alexey, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linux/master] [also build test WARNING on linus/master v5.12-rc6 next-20210401] [cannot apply to hnaz-linux-mm/master] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486 config: s390-randconfig-r006-20210405 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project a46f59a747a7273cc439efaf3b4f98d8b63d2f20) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install s390 cross compiling tool for clang build # apt-get install binutils-s390x-linux-gnu # https://github.com/0day-ci/linux/commit/a5eeb8d197a8e10c333422e9cc0f2c7d976a3426 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Alexey-Avramov/mm-vmscan-add-sysctl-knobs-for-protecting-the-specified/20210406-061034 git checkout a5eeb8d197a8e10c333422e9cc0f2c7d976a3426 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): In file included from mm/vmscan.c:20: In file included from include/linux/swap.h:9: In file included from include/linux/memcontrol.h:22: In file included from include/linux/writeback.h:14: In file included from include/linux/blk-cgroup.h:23: In file included from include/linux/blkdev.h:26: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:80: include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __raw_readb(PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu' #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x)) ^ include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16' #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) ^ In file included from mm/vmscan.c:20: In file included from include/linux/swap.h:9: In file included from include/linux/memcontrol.h:22: In file included from include/linux/writeback.h:14: In file included from include/linux/blk-cgroup.h:23: In file included from include/linux/blkdev.h:26: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:80: include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu' #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x)) ^ include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32' #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x)) ^ In file included from mm/vmscan.c:20: In file included from include/linux/swap.h:9: In file included from include/linux/memcontrol.h:22: In file included from include/linux/writeback.h:14: In file included from include/linux/blk-cgroup.h:23: In file included from include/linux/blkdev.h:26: In file included from include/linux/scatterlist.h:9: In file included from arch/s390/include/asm/io.h:80: include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writeb(value, PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsb(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsw(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsl(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesb(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesw(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesl(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ >> mm/vmscan.c:2819:7: warning: variable 'clean' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (reclaimable_file > dirty) ^~~~~~~~~~~~~~~~~~~~~~~~ mm/vmscan.c:2822:25: note: uninitialized use occurs here sc->clean_below_low = clean < sysctl_clean_low_kbytes; ^~~~~ mm/vmscan.c:2819:3: note: remove the 'if' if its condition is always true if (reclaimable_file > dirty) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/vmscan.c:2812:47: note: initialize the variable 'clean' to silence this warning unsigned long reclaimable_file, dirty, clean; ^ = 0 13 warnings generated. vim +2819 mm/vmscan.c 2706 2707 static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) 2708 { 2709 struct reclaim_state *reclaim_state = current->reclaim_state; 2710 unsigned long nr_reclaimed, nr_scanned; 2711 struct lruvec *target_lruvec; 2712 bool reclaimable = false; 2713 unsigned long file; 2714 2715 target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat); 2716 2717 again: 2718 memset(&sc->nr, 0, sizeof(sc->nr)); 2719 2720 nr_reclaimed = sc->nr_reclaimed; 2721 nr_scanned = sc->nr_scanned; 2722 2723 /* 2724 * Determine the scan balance between anon and file LRUs. 2725 */ 2726 spin_lock_irq(&target_lruvec->lru_lock); 2727 sc->anon_cost = target_lruvec->anon_cost; 2728 sc->file_cost = target_lruvec->file_cost; 2729 spin_unlock_irq(&target_lruvec->lru_lock); 2730 2731 /* 2732 * Target desirable inactive:active list ratios for the anon 2733 * and file LRU lists. 2734 */ 2735 if (!sc->force_deactivate) { 2736 unsigned long refaults; 2737 2738 refaults = lruvec_page_state(target_lruvec, 2739 WORKINGSET_ACTIVATE_ANON); 2740 if (refaults != target_lruvec->refaults[0] || 2741 inactive_is_low(target_lruvec, LRU_INACTIVE_ANON)) 2742 sc->may_deactivate |= DEACTIVATE_ANON; 2743 else 2744 sc->may_deactivate &= ~DEACTIVATE_ANON; 2745 2746 /* 2747 * When refaults are being observed, it means a new 2748 * workingset is being established. Deactivate to get 2749 * rid of any stale active pages quickly. 2750 */ 2751 refaults = lruvec_page_state(target_lruvec, 2752 WORKINGSET_ACTIVATE_FILE); 2753 if (refaults != target_lruvec->refaults[1] || 2754 inactive_is_low(target_lruvec, LRU_INACTIVE_FILE)) 2755 sc->may_deactivate |= DEACTIVATE_FILE; 2756 else 2757 sc->may_deactivate &= ~DEACTIVATE_FILE; 2758 } else 2759 sc->may_deactivate = DEACTIVATE_ANON | DEACTIVATE_FILE; 2760 2761 /* 2762 * If we have plenty of inactive file pages that aren't 2763 * thrashing, try to reclaim those first before touching 2764 * anonymous pages. 2765 */ 2766 file = lruvec_page_state(target_lruvec, NR_INACTIVE_FILE); 2767 if (file >> sc->priority && !(sc->may_deactivate & DEACTIVATE_FILE)) 2768 sc->cache_trim_mode = 1; 2769 else 2770 sc->cache_trim_mode = 0; 2771 2772 /* 2773 * Prevent the reclaimer from falling into the cache trap: as 2774 * cache pages start out inactive, every cache fault will tip 2775 * the scan balance towards the file LRU. And as the file LRU 2776 * shrinks, so does the window for rotation from references. 2777 * This means we have a runaway feedback loop where a tiny 2778 * thrashing file LRU becomes infinitely more attractive than 2779 * anon pages. Try to detect this based on file LRU size. 2780 */ 2781 if (!cgroup_reclaim(sc)) { 2782 unsigned long total_high_wmark = 0; 2783 unsigned long free, anon; 2784 int z; 2785 2786 free = sum_zone_node_page_state(pgdat->node_id, NR_FREE_PAGES); 2787 file = node_page_state(pgdat, NR_ACTIVE_FILE) + 2788 node_page_state(pgdat, NR_INACTIVE_FILE); 2789 2790 for (z = 0; z < MAX_NR_ZONES; z++) { 2791 struct zone *zone = &pgdat->node_zones[z]; 2792 if (!managed_zone(zone)) 2793 continue; 2794 2795 total_high_wmark += high_wmark_pages(zone); 2796 } 2797 2798 /* 2799 * Consider anon: if that's low too, this isn't a 2800 * runaway file reclaim problem, but rather just 2801 * extreme pressure. Reclaim as per usual then. 2802 */ 2803 anon = node_page_state(pgdat, NR_INACTIVE_ANON); 2804 2805 sc->file_is_tiny = 2806 file + free <= total_high_wmark && 2807 !(sc->may_deactivate & DEACTIVATE_ANON) && 2808 anon >> sc->priority; 2809 } 2810 2811 if (sysctl_clean_low_kbytes || sysctl_clean_min_kbytes) { 2812 unsigned long reclaimable_file, dirty, clean; 2813 2814 reclaimable_file = 2815 node_page_state(pgdat, NR_ACTIVE_FILE) + 2816 node_page_state(pgdat, NR_INACTIVE_FILE) + 2817 node_page_state(pgdat, NR_ISOLATED_FILE); 2818 dirty = node_page_state(pgdat, NR_FILE_DIRTY); > 2819 if (reclaimable_file > dirty) 2820 clean = (reclaimable_file - dirty) << (PAGE_SHIFT - 10); 2821 2822 sc->clean_below_low = clean < sysctl_clean_low_kbytes; 2823 sc->clean_below_min = clean < sysctl_clean_min_kbytes; 2824 } else { 2825 sc->clean_below_low = false; 2826 sc->clean_below_min = false; 2827 } 2828 2829 shrink_node_memcgs(pgdat, sc); 2830 2831 if (reclaim_state) { 2832 sc->nr_reclaimed += reclaim_state->reclaimed_slab; 2833 reclaim_state->reclaimed_slab = 0; 2834 } 2835 2836 /* Record the subtree's reclaim efficiency */ 2837 vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true, 2838 sc->nr_scanned - nr_scanned, 2839 sc->nr_reclaimed - nr_reclaimed); 2840 2841 if (sc->nr_reclaimed - nr_reclaimed) 2842 reclaimable = true; 2843 2844 if (current_is_kswapd()) { 2845 /* 2846 * If reclaim is isolating dirty pages under writeback, 2847 * it implies that the long-lived page allocation rate 2848 * is exceeding the page laundering rate. Either the 2849 * global limits are not being effective at throttling 2850 * processes due to the page distribution throughout 2851 * zones or there is heavy usage of a slow backing 2852 * device. The only option is to throttle from reclaim 2853 * context which is not ideal as there is no guarantee 2854 * the dirtying process is throttled in the same way 2855 * balance_dirty_pages() manages. 2856 * 2857 * Once a node is flagged PGDAT_WRITEBACK, kswapd will 2858 * count the number of pages under pages flagged for 2859 * immediate reclaim and stall if any are encountered 2860 * in the nr_immediate check below. 2861 */ 2862 if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken) 2863 set_bit(PGDAT_WRITEBACK, &pgdat->flags); 2864 2865 /* Allow kswapd to start writing pages during reclaim.*/ 2866 if (sc->nr.unqueued_dirty == sc->nr.file_taken) 2867 set_bit(PGDAT_DIRTY, &pgdat->flags); 2868 2869 /* 2870 * If kswapd scans pages marked for immediate 2871 * reclaim and under writeback (nr_immediate), it 2872 * implies that pages are cycling through the LRU 2873 * faster than they are written so also forcibly stall. 2874 */ 2875 if (sc->nr.immediate) 2876 congestion_wait(BLK_RW_ASYNC, HZ/10); 2877 } 2878 2879 /* 2880 * Tag a node/memcg as congested if all the dirty pages 2881 * scanned were backed by a congested BDI and 2882 * wait_iff_congested will stall. 2883 * 2884 * Legacy memcg will stall in page writeback so avoid forcibly 2885 * stalling in wait_iff_congested(). 2886 */ 2887 if ((current_is_kswapd() || 2888 (cgroup_reclaim(sc) && writeback_throttling_sane(sc))) && 2889 sc->nr.dirty && sc->nr.dirty == sc->nr.congested) 2890 set_bit(LRUVEC_CONGESTED, &target_lruvec->flags); 2891 2892 /* 2893 * Stall direct reclaim for IO completions if underlying BDIs 2894 * and node is congested. Allow kswapd to continue until it 2895 * starts encountering unqueued dirty pages or cycling through 2896 * the LRU too quickly. 2897 */ 2898 if (!current_is_kswapd() && current_may_throttle() && 2899 !sc->hibernation_mode && 2900 test_bit(LRUVEC_CONGESTED, &target_lruvec->flags)) 2901 wait_iff_congested(BLK_RW_ASYNC, HZ/10); 2902 2903 if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, 2904 sc)) 2905 goto again; 2906 2907 /* 2908 * Kswapd gives up on balancing particular nodes after too 2909 * many failures to reclaim anything from them and goes to 2910 * sleep. On reclaim progress, reset the failure counter. A 2911 * successful direct reclaim run will revive a dormant kswapd. 2912 */ 2913 if (reclaimable) 2914 pgdat->kswapd_failures = 0; 2915 } 2916 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 11322 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-04-06 1:17 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAKN5gChSwSs1Zy1r7iXHw7ZSKy7Nkr3NqcqJSn7z9yZPr3J2AA@mail.gmail.com> 2021-04-03 0:44 ` [RFC PATCH] mm/swap: fix system stuck due to infinite loop Andrew Morton 2021-04-05 21:59 ` Alexey Avramov 2021-04-06 0:15 ` [PATCH] mm/vmscan: add sysctl knobs for protecting the specified kernel test robot 2021-04-06 1:16 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).