Linux-mm Archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>, Zi Yan <ziy@nvidia.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH V4 00/10] mm: page_alloc: freelist migratetype hygiene
Date: Wed, 20 Mar 2024 14:02:05 -0400	[thread overview]
Message-ID: <20240320180429.678181-1-hannes@cmpxchg.org> (raw)

V4:
- fixed !pcp_order_allowed() case in free_unref_folios()
- reworded the patch 0 changelog a bit for the git log
- rebased to mm-everything-2024-03-19-23-01
- runtime-tested again with various CONFIG_DEBUG_FOOs enabled

---

The page allocator's mobility grouping is intended to keep unmovable
pages separate from reclaimable/compactable ones to allow on-demand
defragmentation for higher-order allocations and huge pages.

Currently, there are several places where accidental type mixing
occurs: an allocation asks for a page of a certain migratetype and
receives another. This ruins pageblocks for compaction, which in turn
makes allocating huge pages more expensive and less reliable.

The series addresses those causes. The last patch adds type checks on
all freelist movements to prevent new violations being introduced.

The benefits can be seen in a mixed workload that stresses the machine
with a memcache-type workload and a kernel build job while
periodically attempting to allocate batches of THP. The following data
is aggregated over 50 consecutive defconfig builds:

                                                        VANILLA                 PATCHED
Hugealloc Time mean                      165843.93 (    +0.00%)  113025.88 (   -31.85%)
Hugealloc Time stddev                    158957.35 (    +0.00%)  114716.07 (   -27.83%)
Kbuild Real time                            310.24 (    +0.00%)     300.73 (    -3.06%)
Kbuild User time                           1271.13 (    +0.00%)    1259.42 (    -0.92%)
Kbuild System time                          582.02 (    +0.00%)     559.79 (    -3.81%)
THP fault alloc                           30585.14 (    +0.00%)   40853.62 (   +33.57%)
THP fault fallback                        36626.46 (    +0.00%)   26357.62 (   -28.04%)
THP fault fail rate %                        54.49 (    +0.00%)      39.22 (   -27.53%)
Pagealloc fallback                         1328.00 (    +0.00%)       1.00 (   -99.85%)
Pagealloc type mismatch                  181009.50 (    +0.00%)       0.00 (  -100.00%)
Direct compact stall                        434.56 (    +0.00%)     257.66 (   -40.61%)
Direct compact fail                         421.70 (    +0.00%)     249.94 (   -40.63%)
Direct compact success                       12.86 (    +0.00%)       7.72 (   -37.09%)
Direct compact success rate %                 2.86 (    +0.00%)       2.82 (    -0.96%)
Compact daemon scanned migrate          3370059.62 (    +0.00%) 3612054.76 (    +7.18%)
Compact daemon scanned free             7718439.20 (    +0.00%) 5386385.02 (   -30.21%)
Compact direct scanned migrate           309248.62 (    +0.00%)  176721.04 (   -42.85%)
Compact direct scanned free              433582.84 (    +0.00%)  315727.66 (   -27.18%)
Compact migrate scanned daemon %             91.20 (    +0.00%)      94.48 (    +3.56%)
Compact free scanned daemon %                94.58 (    +0.00%)      94.42 (    -0.16%)
Compact total migrate scanned           3679308.24 (    +0.00%) 3788775.80 (    +2.98%)
Compact total free scanned              8152022.04 (    +0.00%) 5702112.68 (   -30.05%)
Alloc stall                                 872.04 (    +0.00%)    5156.12 (  +490.71%)
Pages kswapd scanned                     510645.86 (    +0.00%)    3394.94 (   -99.33%)
Pages kswapd reclaimed                   134811.62 (    +0.00%)    2701.26 (   -98.00%)
Pages direct scanned                      99546.06 (    +0.00%)  376407.52 (  +278.12%)
Pages direct reclaimed                    62123.40 (    +0.00%)  289535.70 (  +366.06%)
Pages total scanned                      610191.92 (    +0.00%)  379802.46 (   -37.76%)
Pages scanned kswapd %                       76.36 (    +0.00%)       0.10 (   -98.58%)
Swap out                                  12057.54 (    +0.00%)   15022.98 (   +24.59%)
Swap in                                     209.16 (    +0.00%)     256.48 (   +22.52%)
File refaults                             17701.64 (    +0.00%)   11765.40 (   -33.53%)

Huge page success rate is higher, allocation latencies are shorter and
more predictable.

Stealing (fallback) rate is drastically reduced. Notably, while the
vanilla kernel keeps doing fallbacks on an ongoing basis, the patched
kernel enters a steady state once the distribution of block types is
adequate for the workload. Steals over 50 runs:

VANILLA         PATCHED
1504.0		227.0
1557.0		6.0
1391.0		13.0
1080.0		26.0
1057.0		40.0
1156.0		6.0
805.0		46.0
736.0		20.0
1747.0		2.0
1699.0		34.0
1269.0		13.0
1858.0		12.0
907.0		4.0
727.0		2.0
563.0		2.0
3094.0		2.0
10211.0		3.0
2621.0		1.0
5508.0		2.0
1060.0		2.0
538.0		3.0
5773.0		2.0
2199.0		0.0
3781.0		2.0
1387.0		1.0
4977.0		0.0
2865.0		1.0
1814.0		1.0
3739.0		1.0
6857.0		0.0
382.0		0.0
407.0		1.0
3784.0		0.0
297.0		0.0
298.0		0.0
6636.0		0.0
4188.0		0.0
242.0		0.0
9960.0		0.0
5816.0		0.0
354.0		0.0
287.0		0.0
261.0		0.0
140.0		1.0
2065.0		0.0
312.0		0.0
331.0		0.0
164.0		0.0
465.0		1.0
219.0		0.0

Type mismatches are down too. Those count every time an allocation
request asks for one migratetype and gets another. This can still
occur minimally in the patched kernel due to non-stealing fallbacks,
but it's quite rare and follows the pattern of overall fallbacks -
once the block type distribution settles, mismatches cease as well:

VANILLA:        PATCHED:
182602.0	268.0
135794.0	20.0
88619.0		19.0
95973.0		0.0
129590.0	0.0
129298.0	0.0
147134.0	0.0
230854.0	0.0
239709.0	0.0
137670.0	0.0
132430.0	0.0
65712.0		0.0
57901.0		0.0
67506.0		0.0
63565.0		4.0
34806.0		0.0
42962.0		0.0
32406.0		0.0
38668.0		0.0
61356.0		0.0
57800.0		0.0
41435.0		0.0
83456.0		0.0
65048.0		0.0
28955.0		0.0
47597.0		0.0
75117.0		0.0
55564.0		0.0
38280.0		0.0
52404.0		0.0
26264.0		0.0
37538.0		0.0
19671.0		0.0
30936.0		0.0
26933.0		0.0
16962.0		0.0
44554.0		0.0
46352.0		0.0
24995.0		0.0
35152.0		0.0
12823.0		0.0
21583.0		0.0
18129.0		0.0
31693.0		0.0
28745.0		0.0
33308.0		0.0
31114.0		0.0
35034.0		0.0
12111.0		0.0
24885.0		0.0

Compaction work is markedly reduced despite much better THP rates.

In the vanilla kernel, reclaim seems to have been driven primarily by
watermark boosting that happens as a result of fallbacks. With those
all but eliminated, watermarks average lower and kswapd does less
work. The uptick in direct reclaim is because THP requests have to
fend for themselves more often - which is intended policy right
now. Aggregate reclaim activity is lowered significantly, though.

---

V3:
- fixed freelist type violations from non-atomic page isolation
  updates (Zi Yan)
- fixed incorrect migratetype update ordering during merge (Vlastimil Babka)
- reject moving a zone-straddling block altogether (Vlastimil Babka)
- fixed freelist type violations from lockless migratetype lookups in
  cornercase freeing paths (Vlastimil Babka)
- fixed erroneous WARN in the bulk freeing path that was intended to catch
  mistakes in the now-removed pcpcache (Mike Kravetz)
- fixed typo in patch 1's changelog (Zi Yan)
- optimized migratetype lookup in free_unref_page_list() (Vlastimil Babka)
- batched vmstat updates in page merging hot path (Vlastimil Babka)
- rebased to mm-everything-2024-03-05-20-43 (v6.8-rc5+)

V2:
- dropped the get_pfnblock_migratetype() optimization
  patchlet since somebody else beat me to it (thanks Zi)
- broke out pcp bypass fix since somebody else reported the bug:
  https://lore.kernel.org/linux-mm/20230911181108.GA104295@cmpxchg.org/
- fixed the CONFIG_UNACCEPTED_MEMORY build (lkp)
- rebased to v6.6-rc1

 include/linux/mm.h             |  18 +-
 include/linux/page-isolation.h |   5 +-
 include/linux/vmstat.h         |   8 -
 mm/debug_page_alloc.c          |  12 +-
 mm/internal.h                  |   9 -
 mm/page_alloc.c                | 650 +++++++++++++++++++++------------------
 mm/page_isolation.c            | 122 +++-----
 7 files changed, 415 insertions(+), 409 deletions(-)

Based on mm-everything-2024-03-19-23-01.



             reply	other threads:[~2024-03-20 18:04 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-20 18:02 Johannes Weiner [this message]
2024-03-20 18:02 ` [PATCH 01/10] mm: page_alloc: remove pcppage migratetype caching Johannes Weiner
2024-03-20 18:02 ` [PATCH 02/10] mm: page_alloc: optimize free_unref_folios() Johannes Weiner
2024-03-25 15:56   ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 03/10] mm: page_alloc: fix up block types when merging compatible blocks Johannes Weiner
2024-03-20 18:02 ` [PATCH 04/10] mm: page_alloc: move free pages when converting block during isolation Johannes Weiner
2024-03-20 18:02 ` [PATCH 05/10] mm: page_alloc: fix move_freepages_block() range error Johannes Weiner
2024-03-25 16:22   ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 06/10] mm: page_alloc: fix freelist movement during block conversion Johannes Weiner
2024-03-26 11:28   ` Vlastimil Babka
2024-03-26 12:34     ` Johannes Weiner
2024-04-05 12:11   ` Baolin Wang
2024-04-05 16:56     ` Johannes Weiner
2024-04-07  6:58       ` Baolin Wang
2024-04-08  7:24       ` Vlastimil Babka
2024-04-09  6:21       ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 07/10] mm: page_alloc: close migratetype race between freeing and stealing Johannes Weiner
2024-03-26 15:25   ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 08/10] mm: page_alloc: set migratetype inside move_freepages() Johannes Weiner
2024-03-26 15:40   ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 09/10] mm: page_isolation: prepare for hygienic freelists Johannes Weiner
2024-03-21 13:13   ` kernel test robot
2024-03-21 14:24     ` Johannes Weiner
2024-03-21 15:03       ` Zi Yan
2024-03-27  8:06   ` Vlastimil Babka
2024-03-20 18:02 ` [PATCH 10/10] mm: page_alloc: consolidate free page accounting Johannes Weiner
2024-03-27  8:54   ` Vlastimil Babka
2024-03-27 14:32     ` Johannes Weiner
2024-03-27 18:57     ` [PATCH 1/3] mm: page_alloc: consolidate free page accounting fix Johannes Weiner
2024-03-27 18:58     ` [PATCH 2/3] mm: page_alloc: consolidate free page accounting fix 2 Johannes Weiner
2024-03-27 19:01     ` [PATCH 3/3] mm: page_alloc: batch vmstat updates in expand() Johannes Weiner
2024-03-27 20:35       ` Vlastimil Babka
2024-04-07 10:19   ` [PATCH 10/10] mm: page_alloc: consolidate free page accounting Baolin Wang
2024-04-08  7:38     ` Vlastimil Babka
2024-04-08  9:13       ` Baolin Wang
2024-04-08 14:23       ` Johannes Weiner
2024-04-09  6:23         ` Vlastimil Babka
2024-04-09  7:48           ` [PATCH] mm: page_alloc: consolidate free page accounting fix 3 Baolin Wang
2024-04-09 21:15             ` kernel test robot
2024-04-09 22:36               ` Johannes Weiner
2024-04-09 21:25             ` kernel test robot
2024-04-09  7:56           ` [PATCH 10/10] mm: page_alloc: consolidate free page accounting Baolin Wang
2024-04-09  8:41             ` Vlastimil Babka
2024-04-09  9:31         ` Baolin Wang
2024-04-09 14:46           ` Zi Yan
2024-04-10  8:49             ` Baolin Wang
2024-03-27  9:30 ` [PATCH V4 00/10] mm: page_alloc: freelist migratetype hygiene Vlastimil Babka
2024-03-27 13:10   ` Zi Yan
2024-03-27 14:29   ` Johannes Weiner
2024-04-08  9:30 ` Baolin Wang
2024-04-08 14:24   ` Johannes Weiner
2024-05-11  5:14 ` Yu Zhao
2024-05-13 16:03   ` Johannes Weiner
2024-05-13 18:10     ` Yu Zhao
2024-05-13 19:04       ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240320180429.678181-1-hannes@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).