LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mm: introduce per-order mTHP split counters
@ 2024-04-24 13:51 Lance Yang
  2024-04-24 13:51 ` [PATCH 1/2] mm: add " Lance Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Lance Yang @ 2024-04-24 13:51 UTC (permalink / raw
  To: akpm
  Cc: 21cnbao, ryan.roberts, david, baolin.wang, linux-kernel, linux-mm,
	Lance Yang

Hi all,

At present, the split counters in THP statistics no longer include
PTE-mapped mTHP. Therefore, we want to introduce per-order mTHP split
counters to monitor the frequency of mTHP splits. This will assist
developers in better analyzing and optimizing system performance.

/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
        split_page
        split_page_failed
        deferred_split_page

Thanks,
Lance
---

Lance Yang (2):
 mm: add per-order mTHP split counters
 mm: add docs for per-order mTHP split counters

 Documentation/admin-guide/mm/transhuge.rst | 16 ----------------
 include/linux/huge_mm.h                    |  3 ---
 mm/huge_memory.c                           | 14 ++------------
 3 files changed, 2 insertions(+), 31 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] mm: add per-order mTHP split counters
  2024-04-24 13:51 [PATCH 0/2] mm: introduce per-order mTHP split counters Lance Yang
@ 2024-04-24 13:51 ` Lance Yang
  2024-04-24 15:41   ` Ryan Roberts
                     ` (2 more replies)
  2024-04-24 13:51 ` [PATCH 2/2] mm: add docs for " Lance Yang
  2024-04-24 15:00 ` [PATCH 0/2] mm: introduce " David Hildenbrand
  2 siblings, 3 replies; 16+ messages in thread
From: Lance Yang @ 2024-04-24 13:51 UTC (permalink / raw
  To: akpm
  Cc: 21cnbao, ryan.roberts, david, baolin.wang, linux-kernel, linux-mm,
	Lance Yang

At present, the split counters in THP statistics no longer include
PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
counters to monitor the frequency of mTHP splits. This will assist
developers in better analyzing and optimizing system performance.

/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
        split_page
        split_page_failed
        deferred_split_page

Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 include/linux/huge_mm.h |  3 +++
 mm/huge_memory.c        | 14 ++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 56c7ea73090b..7b9c6590e1f7 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -272,6 +272,9 @@ enum mthp_stat_item {
 	MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
 	MTHP_STAT_ANON_SWPOUT,
 	MTHP_STAT_ANON_SWPOUT_FALLBACK,
+	MTHP_STAT_SPLIT_PAGE,
+	MTHP_STAT_SPLIT_PAGE_FAILED,
+	MTHP_STAT_DEFERRED_SPLIT_PAGE,
 	__MTHP_STAT_COUNT
 };
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 055df5aac7c3..52db888e47a6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
 DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
 DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
 DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
+DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
+DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE);
 
 static struct attribute *stats_attrs[] = {
 	&anon_fault_alloc_attr.attr,
@@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
 	&anon_fault_fallback_charge_attr.attr,
 	&anon_swpout_attr.attr,
 	&anon_swpout_fallback_attr.attr,
+	&split_page_attr.attr,
+	&split_page_failed_attr.attr,
+	&deferred_split_page_attr.attr,
 	NULL,
 };
 
@@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
-	bool is_thp = folio_test_pmd_mappable(folio);
+	int order = folio_order(folio);
 	int extra_pins, ret;
 	pgoff_t end;
 	bool is_hzp;
@@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 		i_mmap_unlock_read(mapping);
 out:
 	xas_destroy(&xas);
-	if (is_thp)
+	if (order >= HPAGE_PMD_ORDER)
 		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
+	count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
+				      MTHP_STAT_SPLIT_PAGE_FAILED);
 	return ret;
 }
 
@@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
 	if (list_empty(&folio->_deferred_list)) {
 		if (folio_test_pmd_mappable(folio))
 			count_vm_event(THP_DEFERRED_SPLIT_PAGE);
+		count_mthp_stat(folio_order(folio),
+				MTHP_STAT_DEFERRED_SPLIT_PAGE);
 		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
 		ds_queue->split_queue_len++;
 #ifdef CONFIG_MEMCG
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] mm: add docs for per-order mTHP split counters
  2024-04-24 13:51 [PATCH 0/2] mm: introduce per-order mTHP split counters Lance Yang
  2024-04-24 13:51 ` [PATCH 1/2] mm: add " Lance Yang
@ 2024-04-24 13:51 ` Lance Yang
  2024-04-24 15:34   ` Ryan Roberts
  2024-04-24 15:00 ` [PATCH 0/2] mm: introduce " David Hildenbrand
  2 siblings, 1 reply; 16+ messages in thread
From: Lance Yang @ 2024-04-24 13:51 UTC (permalink / raw
  To: akpm
  Cc: 21cnbao, ryan.roberts, david, baolin.wang, linux-kernel, linux-mm,
	Lance Yang

This commit introduces documentation for mTHP split counters in
transhuge.rst.

Signed-off-by: Lance Yang <ioworker0@gmail.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index f82300b9193f..35d574a531c8 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -475,6 +475,22 @@ anon_swpout_fallback
 	Usually because failed to allocate some continuous swap space
 	for the huge page.
 
+split_page
+	is incremented every time a huge page is split into base
+	pages. This can happen for a variety of reasons but a common
+	reason is that a huge page is old and is being reclaimed.
+	This action implies splitting all PMD/PTE mapped with the huge page.
+
+split_page_failed
+	is incremented if kernel fails to split huge
+	page. This can happen if the page was pinned by somebody.
+
+deferred_split_page
+	is incremented when a huge page is put onto split
+	queue. This happens when a huge page is partially unmapped and
+	splitting it would free up some memory. Pages on split queue are
+	going to be split under memory pressure.
+
 As the system ages, allocating huge pages may be expensive as the
 system uses memory compaction to copy data around memory to free a
 huge page for use. There are some counters in ``/proc/vmstat`` to help
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] mm: introduce per-order mTHP split counters
  2024-04-24 13:51 [PATCH 0/2] mm: introduce per-order mTHP split counters Lance Yang
  2024-04-24 13:51 ` [PATCH 1/2] mm: add " Lance Yang
  2024-04-24 13:51 ` [PATCH 2/2] mm: add docs for " Lance Yang
@ 2024-04-24 15:00 ` David Hildenbrand
  2024-04-24 15:20   ` Ryan Roberts
  2 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2024-04-24 15:00 UTC (permalink / raw
  To: Lance Yang, akpm
  Cc: 21cnbao, ryan.roberts, baolin.wang, linux-kernel, linux-mm

On 24.04.24 15:51, Lance Yang wrote:
> Hi all,
> 
> At present, the split counters in THP statistics no longer include
> PTE-mapped mTHP. Therefore, we want to introduce per-order mTHP split
> counters to monitor the frequency of mTHP splits. This will assist
> developers in better analyzing and optimizing system performance.
> 
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>          split_page
>          split_page_failed
>          deferred_split_page
> 
> Thanks,
> Lance
> ---
> 
> Lance Yang (2):
>   mm: add per-order mTHP split counters
>   mm: add docs for per-order mTHP split counters
> 
>   Documentation/admin-guide/mm/transhuge.rst | 16 ----------------

We really have to start documenting these, and what the sementics are.

E.g., is split_page_failed contained in split_page? Is 
deferred_split_page contained in split_page?

But also: just don't call it "split_page". Drop the "_page".

split
split_failed
split_deferred

?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] mm: introduce per-order mTHP split counters
  2024-04-24 15:00 ` [PATCH 0/2] mm: introduce " David Hildenbrand
@ 2024-04-24 15:20   ` Ryan Roberts
  2024-04-24 15:29     ` David Hildenbrand
  2024-04-24 15:54     ` Lance Yang
  0 siblings, 2 replies; 16+ messages in thread
From: Ryan Roberts @ 2024-04-24 15:20 UTC (permalink / raw
  To: David Hildenbrand, Lance Yang, akpm
  Cc: 21cnbao, baolin.wang, linux-kernel, linux-mm

On 24/04/2024 16:00, David Hildenbrand wrote:
> On 24.04.24 15:51, Lance Yang wrote:
>> Hi all,
>>
>> At present, the split counters in THP statistics no longer include
>> PTE-mapped mTHP. Therefore, we want to introduce per-order mTHP split
>> counters to monitor the frequency of mTHP splits. This will assist
>> developers in better analyzing and optimizing system performance.
>>
>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>>          split_page
>>          split_page_failed
>>          deferred_split_page
>>
>> Thanks,
>> Lance
>> ---
>>
>> Lance Yang (2):
>>   mm: add per-order mTHP split counters
>>   mm: add docs for per-order mTHP split counters
>>
>>   Documentation/admin-guide/mm/transhuge.rst | 16 ----------------
> 
> We really have to start documenting these, and what the sementics are.

I think the diffstat is backwards; the series definitely adds more lines than it
removes. And patch 2 is adding 16 lines of docs, not removing them. How are you
generating this? `git format-patch` should do it correctly for you.

> 
> E.g., is split_page_failed contained in split_page? Is deferred_split_page
> contained in split_page?
> 
> But also: just don't call it "split_page". Drop the "_page".
> 
> split
> split_failed
> split_deferred

I guess we are back in "should we be consistent with the existing vmstats"
territory, which uses split_page/split_page_failed/deferred_split_page

But here, I agree that dropping _page is nicer.

> 
> ?
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] mm: introduce per-order mTHP split counters
  2024-04-24 15:20   ` Ryan Roberts
@ 2024-04-24 15:29     ` David Hildenbrand
  2024-04-24 15:53       ` Lance Yang
  2024-04-24 15:54     ` Lance Yang
  1 sibling, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2024-04-24 15:29 UTC (permalink / raw
  To: Ryan Roberts, Lance Yang, akpm
  Cc: 21cnbao, baolin.wang, linux-kernel, linux-mm

On 24.04.24 17:20, Ryan Roberts wrote:
> On 24/04/2024 16:00, David Hildenbrand wrote:
>> On 24.04.24 15:51, Lance Yang wrote:
>>> Hi all,
>>>
>>> At present, the split counters in THP statistics no longer include
>>> PTE-mapped mTHP. Therefore, we want to introduce per-order mTHP split
>>> counters to monitor the frequency of mTHP splits. This will assist
>>> developers in better analyzing and optimizing system performance.
>>>
>>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>>>           split_page
>>>           split_page_failed
>>>           deferred_split_page
>>>
>>> Thanks,
>>> Lance
>>> ---
>>>
>>> Lance Yang (2):
>>>    mm: add per-order mTHP split counters
>>>    mm: add docs for per-order mTHP split counters
>>>
>>>    Documentation/admin-guide/mm/transhuge.rst | 16 ----------------
>>
>> We really have to start documenting these, and what the sementics are.
> 
> I think the diffstat is backwards; the series definitely adds more lines than it
> removes. And patch 2 is adding 16 lines of docs, not removing them. How are you
> generating this? `git format-patch` should do it correctly for you.
> 
>>
>> E.g., is split_page_failed contained in split_page? Is deferred_split_page
>> contained in split_page?
>>
>> But also: just don't call it "split_page". Drop the "_page".
>>
>> split
>> split_failed
>> split_deferred
> 
> I guess we are back in "should we be consistent with the existing vmstats"
> territory, which uses split_page/split_page_failed/deferred_split_page
> 

Yeah, "thp_split_page" really is odd "transparent huge page split page".

> But here, I agree that dropping _page is nicer.
Right; we also shouldn't call it "thp_split_page" here :)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] mm: add docs for per-order mTHP split counters
  2024-04-24 13:51 ` [PATCH 2/2] mm: add docs for " Lance Yang
@ 2024-04-24 15:34   ` Ryan Roberts
  2024-04-25  5:26     ` Lance Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Ryan Roberts @ 2024-04-24 15:34 UTC (permalink / raw
  To: Lance Yang, akpm; +Cc: 21cnbao, david, baolin.wang, linux-kernel, linux-mm

On 24/04/2024 14:51, Lance Yang wrote:
> This commit introduces documentation for mTHP split counters in
> transhuge.rst.
> 
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index f82300b9193f..35d574a531c8 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -475,6 +475,22 @@ anon_swpout_fallback
>  	Usually because failed to allocate some continuous swap space
>  	for the huge page.
>  
> +split_page
> +	is incremented every time a huge page is split into base

perhaps "...successfully split into base..." to make it clear that this is only
incremented on success.

> +	pages. This can happen for a variety of reasons but a common
> +	reason is that a huge page is old and is being reclaimed.
> +	This action implies splitting all PMD/PTE mapped with the huge page.

What does it mean to "split all PTE"? It's already at its smallest granularity.
Perhaps "This action implies splitting any block mappings into PTEs."?

> +
> +split_page_failed
> +	is incremented if kernel fails to split huge
> +	page. This can happen if the page was pinned by somebody.
> +
> +deferred_split_page
> +	is incremented when a huge page is put onto split
> +	queue. This happens when a huge page is partially unmapped and
> +	splitting it would free up some memory. Pages on split queue are
> +	going to be split under memory pressure.
> +
>  As the system ages, allocating huge pages may be expensive as the
>  system uses memory compaction to copy data around memory to free a
>  huge page for use. There are some counters in ``/proc/vmstat`` to help


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm: add per-order mTHP split counters
  2024-04-24 13:51 ` [PATCH 1/2] mm: add " Lance Yang
@ 2024-04-24 15:41   ` Ryan Roberts
  2024-04-24 17:12   ` Bang Li
  2024-04-24 19:44   ` Yang Shi
  2 siblings, 0 replies; 16+ messages in thread
From: Ryan Roberts @ 2024-04-24 15:41 UTC (permalink / raw
  To: Lance Yang, akpm, Barry Song
  Cc: 21cnbao, david, baolin.wang, linux-kernel, linux-mm

+ Barry

On 24/04/2024 14:51, Lance Yang wrote:
> At present, the split counters in THP statistics no longer include
> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
> counters to monitor the frequency of mTHP splits. This will assist
> developers in better analyzing and optimizing system performance.
> 
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>         split_page
>         split_page_failed
>         deferred_split_page
> 
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>  include/linux/huge_mm.h |  3 +++
>  mm/huge_memory.c        | 14 ++++++++++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 56c7ea73090b..7b9c6590e1f7 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -272,6 +272,9 @@ enum mthp_stat_item {
>  	MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>  	MTHP_STAT_ANON_SWPOUT,
>  	MTHP_STAT_ANON_SWPOUT_FALLBACK,
> +	MTHP_STAT_SPLIT_PAGE,
> +	MTHP_STAT_SPLIT_PAGE_FAILED,
> +	MTHP_STAT_DEFERRED_SPLIT_PAGE,
>  	__MTHP_STAT_COUNT
>  };
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 055df5aac7c3..52db888e47a6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>  DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>  DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
>  DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE);
>  
>  static struct attribute *stats_attrs[] = {
>  	&anon_fault_alloc_attr.attr,
> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
>  	&anon_fault_fallback_charge_attr.attr,
>  	&anon_swpout_attr.attr,
>  	&anon_swpout_fallback_attr.attr,
> +	&split_page_attr.attr,
> +	&split_page_failed_attr.attr,
> +	&deferred_split_page_attr.attr,
>  	NULL,
>  };
>  
> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>  	XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>  	struct anon_vma *anon_vma = NULL;
>  	struct address_space *mapping = NULL;
> -	bool is_thp = folio_test_pmd_mappable(folio);
> +	int order = folio_order(folio);
>  	int extra_pins, ret;
>  	pgoff_t end;
>  	bool is_hzp;
> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>  		i_mmap_unlock_read(mapping);
>  out:
>  	xas_destroy(&xas);
> -	if (is_thp)
> +	if (order >= HPAGE_PMD_ORDER)
>  		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> +	count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
> +				      MTHP_STAT_SPLIT_PAGE_FAILED);
>  	return ret;
>  }
>  
> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
>  	if (list_empty(&folio->_deferred_list)) {
>  		if (folio_test_pmd_mappable(folio))
>  			count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> +		count_mthp_stat(folio_order(folio),
> +				MTHP_STAT_DEFERRED_SPLIT_PAGE);

There is a very long conversation with Barry about adding a 'global "mTHP became
partially mapped 1 or more processes" counter (inc only)', which terminates at
[1]. There is a lot of discussion about the required semantics around the need
for partial map to cover alignment and contiguity as well as whether all pages
are mapped, and to trigger once it becomes partial in at least 1 process.

MTHP_STAT_DEFERRED_SPLIT_PAGE is giving much simpler semantics, but less
information as a result. Barry, what's your view here? I'm guessing this doesn't
quite solve what you are looking for?

[1] https://lore.kernel.org/linux-mm/6cc7d781-884f-4d8f-a175-8609732b87eb@arm.com/

Thanks,
Ryan

>  		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
>  		ds_queue->split_queue_len++;
>  #ifdef CONFIG_MEMCG


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] mm: introduce per-order mTHP split counters
  2024-04-24 15:29     ` David Hildenbrand
@ 2024-04-24 15:53       ` Lance Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Lance Yang @ 2024-04-24 15:53 UTC (permalink / raw
  To: David Hildenbrand
  Cc: Ryan Roberts, akpm, 21cnbao, baolin.wang, linux-kernel, linux-mm

On Wed, Apr 24, 2024 at 11:29 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.04.24 17:20, Ryan Roberts wrote:
> > On 24/04/2024 16:00, David Hildenbrand wrote:
> >> On 24.04.24 15:51, Lance Yang wrote:
> >>> Hi all,
> >>>
> >>> At present, the split counters in THP statistics no longer include
> >>> PTE-mapped mTHP. Therefore, we want to introduce per-order mTHP split
> >>> counters to monitor the frequency of mTHP splits. This will assist
> >>> developers in better analyzing and optimizing system performance.
> >>>
> >>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >>>           split_page
> >>>           split_page_failed
> >>>           deferred_split_page
> >>>
> >>> Thanks,
> >>> Lance
> >>> ---
> >>>
> >>> Lance Yang (2):
> >>>    mm: add per-order mTHP split counters
> >>>    mm: add docs for per-order mTHP split counters
> >>>
> >>>    Documentation/admin-guide/mm/transhuge.rst | 16 ----------------
> >>
> >> We really have to start documenting these, and what the sementics are.
> >
> > I think the diffstat is backwards; the series definitely adds more lines than it
> > removes. And patch 2 is adding 16 lines of docs, not removing them. How are you
> > generating this? `git format-patch` should do it correctly for you.
> >
> >>
> >> E.g., is split_page_failed contained in split_page? Is deferred_split_page
> >> contained in split_page?
> >>
> >> But also: just don't call it "split_page". Drop the "_page".
> >>
> >> split
> >> split_failed
> >> split_deferred
> >
> > I guess we are back in "should we be consistent with the existing vmstats"
> > territory, which uses split_page/split_page_failed/deferred_split_page
> >
>
> Yeah, "thp_split_page" really is odd "transparent huge page split page".
>
> > But here, I agree that dropping _page is nicer.
> Right; we also shouldn't call it "thp_split_page" here :)

Yep, I understood. Let's drop the "_page".

split
split_failed
split_deferred

Thanks,
Lance

>
> --
> Cheers,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] mm: introduce per-order mTHP split counters
  2024-04-24 15:20   ` Ryan Roberts
  2024-04-24 15:29     ` David Hildenbrand
@ 2024-04-24 15:54     ` Lance Yang
  1 sibling, 0 replies; 16+ messages in thread
From: Lance Yang @ 2024-04-24 15:54 UTC (permalink / raw
  To: Ryan Roberts
  Cc: David Hildenbrand, akpm, 21cnbao, baolin.wang, linux-kernel,
	linux-mm

On Wed, Apr 24, 2024 at 11:20 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 24/04/2024 16:00, David Hildenbrand wrote:
> > On 24.04.24 15:51, Lance Yang wrote:
> >> Hi all,
> >>
> >> At present, the split counters in THP statistics no longer include
> >> PTE-mapped mTHP. Therefore, we want to introduce per-order mTHP split
> >> counters to monitor the frequency of mTHP splits. This will assist
> >> developers in better analyzing and optimizing system performance.
> >>
> >> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >>          split_page
> >>          split_page_failed
> >>          deferred_split_page
> >>
> >> Thanks,
> >> Lance
> >> ---
> >>
> >> Lance Yang (2):
> >>   mm: add per-order mTHP split counters
> >>   mm: add docs for per-order mTHP split counters
> >>
> >>   Documentation/admin-guide/mm/transhuge.rst | 16 ----------------
> >
> > We really have to start documenting these, and what the sementics are.
>
> I think the diffstat is backwards; the series definitely adds more lines than it

Good spot! I'll sort it out.

Thanks,
Lance

> removes. And patch 2 is adding 16 lines of docs, not removing them. How are you
> generating this? `git format-patch` should do it correctly for you.
>
> >
> > E.g., is split_page_failed contained in split_page? Is deferred_split_page
> > contained in split_page?
> >
> > But also: just don't call it "split_page". Drop the "_page".
> >
> > split
> > split_failed
> > split_deferred
>
> I guess we are back in "should we be consistent with the existing vmstats"
> territory, which uses split_page/split_page_failed/deferred_split_page
>
> But here, I agree that dropping _page is nicer.
>
> >
> > ?
> >
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm: add per-order mTHP split counters
  2024-04-24 13:51 ` [PATCH 1/2] mm: add " Lance Yang
  2024-04-24 15:41   ` Ryan Roberts
@ 2024-04-24 17:12   ` Bang Li
  2024-04-24 17:58     ` Bang Li
  2024-04-24 19:44   ` Yang Shi
  2 siblings, 1 reply; 16+ messages in thread
From: Bang Li @ 2024-04-24 17:12 UTC (permalink / raw
  To: Lance Yang
  Cc: akpm, 21cnbao, ryan.roberts, david, baolin.wang, linux-kernel,
	linux-mm

Hey Lance,

On 2024/4/24 21:51, Lance Yang wrote:

> At present, the split counters in THP statistics no longer include
> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
> counters to monitor the frequency of mTHP splits. This will assist
> developers in better analyzing and optimizing system performance.
>
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>          split_page
>          split_page_failed
>          deferred_split_page
>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>   include/linux/huge_mm.h |  3 +++
>   mm/huge_memory.c        | 14 ++++++++++++--
>   2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 56c7ea73090b..7b9c6590e1f7 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -272,6 +272,9 @@ enum mthp_stat_item {
>   	MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>   	MTHP_STAT_ANON_SWPOUT,
>   	MTHP_STAT_ANON_SWPOUT_FALLBACK,
> +	MTHP_STAT_SPLIT_PAGE,
> +	MTHP_STAT_SPLIT_PAGE_FAILED,
> +	MTHP_STAT_DEFERRED_SPLIT_PAGE,
>   	__MTHP_STAT_COUNT
>   };
>   
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 055df5aac7c3..52db888e47a6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>   DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
>   DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE);
>   
>   static struct attribute *stats_attrs[] = {
>   	&anon_fault_alloc_attr.attr,
> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
>   	&anon_fault_fallback_charge_attr.attr,
>   	&anon_swpout_attr.attr,
>   	&anon_swpout_fallback_attr.attr,
> +	&split_page_attr.attr,
> +	&split_page_failed_attr.attr,
> +	&deferred_split_page_attr.attr,
>   	NULL,
>   };
>   
> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>   	XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>   	struct anon_vma *anon_vma = NULL;
>   	struct address_space *mapping = NULL;
> -	bool is_thp = folio_test_pmd_mappable(folio);
> +	int order = folio_order(folio);
>   	int extra_pins, ret;
>   	pgoff_t end;
>   	bool is_hzp;
> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>   		i_mmap_unlock_read(mapping);
>   out:
>   	xas_destroy(&xas);
> -	if (is_thp)
> +	if (order >= HPAGE_PMD_ORDER)
>   		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> +	count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
> +				      MTHP_STAT_SPLIT_PAGE_FAILED);
>   	return ret;
>   }
>   
> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
>   	if (list_empty(&folio->_deferred_list)) {
>   		if (folio_test_pmd_mappable(folio))
>   			count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> +		count_mthp_stat(folio_order(folio),
> +				MTHP_STAT_DEFERRED_SPLIT_PAGE);
>   		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
>   		ds_queue->split_queue_len++;
>   #ifdef CONFIG_MEMCG

My opinion can be ignored :). Would it be better to modify the 
deferred_split_folio
function as follows? I'm not sure.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 
055df5aac7c3..e8562e8630b1 100644 --- a/mm/huge_memory.c +++ 
b/mm/huge_memory.c @@ -3299,12 +3299,13 @@ void 
deferred_split_folio(struct folio *folio) struct mem_cgroup *memcg = 
folio_memcg(folio); #endif unsigned long flags; + int order = 
folio_order(folio); /* * Order 1 folios have no space for a deferred 
list, but we also * won't waste much memory by not adding them to the 
deferred list. */ - if (folio_order(folio) <= 1) + if (order <= 1) 
return; /* @@ -3325,8 +3326,9 @@ void deferred_split_folio(struct folio 
*folio) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if 
(list_empty(&folio->_deferred_list)) { - if 
(folio_test_pmd_mappable(folio)) + if (order >= HPAGE_PMD_ORDER) 
count_vm_event(THP_DEFERRED_SPLIT_PAGE); + count_mthp_stat(order, 
MTHP_STAT_DEFERRED_SPLIT_PAGE); list_add_tail(&folio->_deferred_list, 
&ds_queue->split_queue); ds_queue->split_queue_len++; #ifdef CONFIG_MEMCG thanks,
bang


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm: add per-order mTHP split counters
  2024-04-24 17:12   ` Bang Li
@ 2024-04-24 17:58     ` Bang Li
  2024-04-25  4:47       ` Lance Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Bang Li @ 2024-04-24 17:58 UTC (permalink / raw
  To: Lance Yang
  Cc: akpm, 21cnbao, ryan.roberts, david, baolin.wang, linux-kernel,
	linux-mm, Bang Li

Hey, sorry for making noise, there was something wrong with the format of
the last email.

On 2024/4/25 1:12, Bang Li wrote:
> Hey Lance,
>
> On 2024/4/24 21:51, Lance Yang wrote:
>
>> At present, the split counters in THP statistics no longer include
>> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
>> counters to monitor the frequency of mTHP splits. This will assist
>> developers in better analyzing and optimizing system performance.
>>
>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>>          split_page
>>          split_page_failed
>>          deferred_split_page
>>
>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
>> ---
>>   include/linux/huge_mm.h |  3 +++
>>   mm/huge_memory.c        | 14 ++++++++++++--
>>   2 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 56c7ea73090b..7b9c6590e1f7 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -272,6 +272,9 @@ enum mthp_stat_item {
>>       MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>>       MTHP_STAT_ANON_SWPOUT,
>>       MTHP_STAT_ANON_SWPOUT_FALLBACK,
>> +    MTHP_STAT_SPLIT_PAGE,
>> +    MTHP_STAT_SPLIT_PAGE_FAILED,
>> +    MTHP_STAT_DEFERRED_SPLIT_PAGE,
>>       __MTHP_STAT_COUNT
>>   };
>>   diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 055df5aac7c3..52db888e47a6 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, 
>> MTHP_STAT_ANON_FAULT_FALLBACK);
>>   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, 
>> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>>   DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
>>   DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, 
>> MTHP_STAT_ANON_SWPOUT_FALLBACK);
>> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
>> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
>> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, 
>> MTHP_STAT_DEFERRED_SPLIT_PAGE);
>>     static struct attribute *stats_attrs[] = {
>>       &anon_fault_alloc_attr.attr,
>> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
>>       &anon_fault_fallback_charge_attr.attr,
>>       &anon_swpout_attr.attr,
>>       &anon_swpout_fallback_attr.attr,
>> +    &split_page_attr.attr,
>> +    &split_page_failed_attr.attr,
>> +    &deferred_split_page_attr.attr,
>>       NULL,
>>   };
>>   @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct 
>> page *page, struct list_head *list,
>>       XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, 
>> new_order);
>>       struct anon_vma *anon_vma = NULL;
>>       struct address_space *mapping = NULL;
>> -    bool is_thp = folio_test_pmd_mappable(folio);
>> +    int order = folio_order(folio);
>>       int extra_pins, ret;
>>       pgoff_t end;
>>       bool is_hzp;
>> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct 
>> page *page, struct list_head *list,
>>           i_mmap_unlock_read(mapping);
>>   out:
>>       xas_destroy(&xas);
>> -    if (is_thp)
>> +    if (order >= HPAGE_PMD_ORDER)
>>           count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
>> +    count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
>> +                      MTHP_STAT_SPLIT_PAGE_FAILED);
>>       return ret;
>>   }
>>   @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
>>       if (list_empty(&folio->_deferred_list)) {
>>           if (folio_test_pmd_mappable(folio))
>>               count_vm_event(THP_DEFERRED_SPLIT_PAGE);
>> +        count_mthp_stat(folio_order(folio),
>> +                MTHP_STAT_DEFERRED_SPLIT_PAGE);
>>           list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
>>           ds_queue->split_queue_len++;
>>   #ifdef CONFIG_MEMCG
>
> My opinion can be ignored :). Would it be better to modify the 
> deferred_split_folio
> function as follows? I'm not sure.
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 
> 055df5aac7c3..e8562e8630b1 100644 --- a/mm/huge_memory.c +++ 
> b/mm/huge_memory.c @@ -3299,12 +3299,13 @@ void 
> deferred_split_folio(struct folio *folio) struct mem_cgroup *memcg = 
> folio_memcg(folio); #endif unsigned long flags; + int order = 
> folio_order(folio); /* * Order 1 folios have no space for a deferred 
> list, but we also * won't waste much memory by not adding them to the 
> deferred list. */ - if (folio_order(folio) <= 1) + if (order <= 1) 
> return; /* @@ -3325,8 +3326,9 @@ void deferred_split_folio(struct 
> folio *folio) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); 
> if (list_empty(&folio->_deferred_list)) { - if 
> (folio_test_pmd_mappable(folio)) + if (order >= HPAGE_PMD_ORDER) 
> count_vm_event(THP_DEFERRED_SPLIT_PAGE); + count_mthp_stat(order, 
> MTHP_STAT_DEFERRED_SPLIT_PAGE); list_add_tail(&folio->_deferred_list, 
> &ds_queue->split_queue); ds_queue->split_queue_len++; #ifdef 
> CONFIG_MEMCG thanks,
> bang
>

My opinion can be ignored :). Would it be better to modify the 
deferred_split_folio
function as follows? I'm not sure.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 055df5aac7c3..e8562e8630b1 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3299,12 +3299,13 @@ void deferred_split_folio(struct folio *folio)
         struct mem_cgroup *memcg = folio_memcg(folio);
  #endif
         unsigned long flags;
+       int order = folio_order(folio);

         /*
          * Order 1 folios have no space for a deferred list, but we also
          * won't waste much memory by not adding them to the deferred list.
          */
-       if (folio_order(folio) <= 1)
+       if (order <= 1)
                 return;

         /*
@@ -3325,8 +3326,9 @@ void deferred_split_folio(struct folio *folio)

         spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
         if (list_empty(&folio->_deferred_list)) {
-               if (folio_test_pmd_mappable(folio))
+               if (order >= HPAGE_PMD_ORDER)
                         count_vm_event(THP_DEFERRED_SPLIT_PAGE);
+               count_mthp_stat(order, MTHP_STAT_DEFERRED_SPLIT_PAGE);
                 list_add_tail(&folio->_deferred_list, 
&ds_queue->split_queue);
                 ds_queue->split_queue_len++;
  #ifdef CONFIG_MEMCG

thanks,
bang

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm: add per-order mTHP split counters
  2024-04-24 13:51 ` [PATCH 1/2] mm: add " Lance Yang
  2024-04-24 15:41   ` Ryan Roberts
  2024-04-24 17:12   ` Bang Li
@ 2024-04-24 19:44   ` Yang Shi
  2024-04-25  5:13     ` Lance Yang
  2 siblings, 1 reply; 16+ messages in thread
From: Yang Shi @ 2024-04-24 19:44 UTC (permalink / raw
  To: Lance Yang, Zi Yan
  Cc: akpm, 21cnbao, ryan.roberts, david, baolin.wang, linux-kernel,
	linux-mm

On Wed, Apr 24, 2024 at 6:53 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> At present, the split counters in THP statistics no longer include
> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
> counters to monitor the frequency of mTHP splits. This will assist
> developers in better analyzing and optimizing system performance.
>
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>         split_page
>         split_page_failed
>         deferred_split_page

The deferred_split_page counter may easily go insane with the fix from
https://lore.kernel.org/linux-mm/20240411153232.169560-1-zi.yan@sent.com/

Zi Yan,

Will you submit v2 for this patch soon?


>
> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> ---
>  include/linux/huge_mm.h |  3 +++
>  mm/huge_memory.c        | 14 ++++++++++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 56c7ea73090b..7b9c6590e1f7 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -272,6 +272,9 @@ enum mthp_stat_item {
>         MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>         MTHP_STAT_ANON_SWPOUT,
>         MTHP_STAT_ANON_SWPOUT_FALLBACK,
> +       MTHP_STAT_SPLIT_PAGE,
> +       MTHP_STAT_SPLIT_PAGE_FAILED,
> +       MTHP_STAT_DEFERRED_SPLIT_PAGE,
>         __MTHP_STAT_COUNT
>  };
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 055df5aac7c3..52db888e47a6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>  DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>  DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
>  DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
> +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE);
>
>  static struct attribute *stats_attrs[] = {
>         &anon_fault_alloc_attr.attr,
> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
>         &anon_fault_fallback_charge_attr.attr,
>         &anon_swpout_attr.attr,
>         &anon_swpout_fallback_attr.attr,
> +       &split_page_attr.attr,
> +       &split_page_failed_attr.attr,
> +       &deferred_split_page_attr.attr,
>         NULL,
>  };
>
> @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>         XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
>         struct anon_vma *anon_vma = NULL;
>         struct address_space *mapping = NULL;
> -       bool is_thp = folio_test_pmd_mappable(folio);
> +       int order = folio_order(folio);
>         int extra_pins, ret;
>         pgoff_t end;
>         bool is_hzp;
> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>                 i_mmap_unlock_read(mapping);
>  out:
>         xas_destroy(&xas);
> -       if (is_thp)
> +       if (order >= HPAGE_PMD_ORDER)
>                 count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> +       count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
> +                                     MTHP_STAT_SPLIT_PAGE_FAILED);
>         return ret;
>  }
>
> @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
>         if (list_empty(&folio->_deferred_list)) {
>                 if (folio_test_pmd_mappable(folio))
>                         count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> +               count_mthp_stat(folio_order(folio),
> +                               MTHP_STAT_DEFERRED_SPLIT_PAGE);
>                 list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
>                 ds_queue->split_queue_len++;
>  #ifdef CONFIG_MEMCG
> --
> 2.33.1
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm: add per-order mTHP split counters
  2024-04-24 17:58     ` Bang Li
@ 2024-04-25  4:47       ` Lance Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Lance Yang @ 2024-04-25  4:47 UTC (permalink / raw
  To: Bang Li
  Cc: akpm, 21cnbao, ryan.roberts, david, baolin.wang, linux-kernel,
	linux-mm

Hey Bang,

Thanks for taking time to review!

On Thu, Apr 25, 2024 at 1:59 AM Bang Li <libang.linux@gmail.com> wrote:
>
> Hey, sorry for making noise, there was something wrong with the format of
> the last email.
>
> On 2024/4/25 1:12, Bang Li wrote:
> > Hey Lance,
> >
> > On 2024/4/24 21:51, Lance Yang wrote:
> >
> >> At present, the split counters in THP statistics no longer include
> >> PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
> >> counters to monitor the frequency of mTHP splits. This will assist
> >> developers in better analyzing and optimizing system performance.
> >>
> >> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >>          split_page
> >>          split_page_failed
> >>          deferred_split_page
> >>
> >> Signed-off-by: Lance Yang <ioworker0@gmail.com>
> >> ---
> >>   include/linux/huge_mm.h |  3 +++
> >>   mm/huge_memory.c        | 14 ++++++++++++--
> >>   2 files changed, 15 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index 56c7ea73090b..7b9c6590e1f7 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -272,6 +272,9 @@ enum mthp_stat_item {
> >>       MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> >>       MTHP_STAT_ANON_SWPOUT,
> >>       MTHP_STAT_ANON_SWPOUT_FALLBACK,
> >> +    MTHP_STAT_SPLIT_PAGE,
> >> +    MTHP_STAT_SPLIT_PAGE_FAILED,
> >> +    MTHP_STAT_DEFERRED_SPLIT_PAGE,
> >>       __MTHP_STAT_COUNT
> >>   };
> >>   diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 055df5aac7c3..52db888e47a6 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback,
> >> MTHP_STAT_ANON_FAULT_FALLBACK);
> >>   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge,
> >> MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> >>   DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> >>   DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback,
> >> MTHP_STAT_ANON_SWPOUT_FALLBACK);
> >> +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
> >> +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
> >> +DEFINE_MTHP_STAT_ATTR(deferred_split_page,
> >> MTHP_STAT_DEFERRED_SPLIT_PAGE);
> >>     static struct attribute *stats_attrs[] = {
> >>       &anon_fault_alloc_attr.attr,
> >> @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
> >>       &anon_fault_fallback_charge_attr.attr,
> >>       &anon_swpout_attr.attr,
> >>       &anon_swpout_fallback_attr.attr,
> >> +    &split_page_attr.attr,
> >> +    &split_page_failed_attr.attr,
> >> +    &deferred_split_page_attr.attr,
> >>       NULL,
> >>   };
> >>   @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct
> >> page *page, struct list_head *list,
> >>       XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index,
> >> new_order);
> >>       struct anon_vma *anon_vma = NULL;
> >>       struct address_space *mapping = NULL;
> >> -    bool is_thp = folio_test_pmd_mappable(folio);
> >> +    int order = folio_order(folio);
> >>       int extra_pins, ret;
> >>       pgoff_t end;
> >>       bool is_hzp;
> >> @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct
> >> page *page, struct list_head *list,
> >>           i_mmap_unlock_read(mapping);
> >>   out:
> >>       xas_destroy(&xas);
> >> -    if (is_thp)
> >> +    if (order >= HPAGE_PMD_ORDER)
> >>           count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> >> +    count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
> >> +                      MTHP_STAT_SPLIT_PAGE_FAILED);
> >>       return ret;
> >>   }
> >>   @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
> >>       if (list_empty(&folio->_deferred_list)) {
> >>           if (folio_test_pmd_mappable(folio))
> >>               count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> >> +        count_mthp_stat(folio_order(folio),
> >> +                MTHP_STAT_DEFERRED_SPLIT_PAGE);
> >>           list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
> >>           ds_queue->split_queue_len++;
> >>   #ifdef CONFIG_MEMCG
> >
> > My opinion can be ignored :). Would it be better to modify the
> > deferred_split_folio
> > function as follows? I'm not sure.
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c index
> > 055df5aac7c3..e8562e8630b1 100644 --- a/mm/huge_memory.c +++
> > b/mm/huge_memory.c @@ -3299,12 +3299,13 @@ void
> > deferred_split_folio(struct folio *folio) struct mem_cgroup *memcg =
> > folio_memcg(folio); #endif unsigned long flags; + int order =
> > folio_order(folio); /* * Order 1 folios have no space for a deferred
> > list, but we also * won't waste much memory by not adding them to the
> > deferred list. */ - if (folio_order(folio) <= 1) + if (order <= 1)
> > return; /* @@ -3325,8 +3326,9 @@ void deferred_split_folio(struct
> > folio *folio) spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> > if (list_empty(&folio->_deferred_list)) { - if
> > (folio_test_pmd_mappable(folio)) + if (order >= HPAGE_PMD_ORDER)
> > count_vm_event(THP_DEFERRED_SPLIT_PAGE); + count_mthp_stat(order,
> > MTHP_STAT_DEFERRED_SPLIT_PAGE); list_add_tail(&folio->_deferred_list,
> > &ds_queue->split_queue); ds_queue->split_queue_len++; #ifdef
> > CONFIG_MEMCG thanks,
> > bang
> >
>
> My opinion can be ignored :). Would it be better to modify the
> deferred_split_folio
> function as follows? I'm not sure.
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 055df5aac7c3..e8562e8630b1 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3299,12 +3299,13 @@ void deferred_split_folio(struct folio *folio)
>          struct mem_cgroup *memcg = folio_memcg(folio);
>   #endif
>          unsigned long flags;
> +       int order = folio_order(folio);

I'll consider storing it in a variable earlier for later reuse.

Thanks,
Lance

>
>          /*
>           * Order 1 folios have no space for a deferred list, but we also
>           * won't waste much memory by not adding them to the deferred list.
>           */
> -       if (folio_order(folio) <= 1)
> +       if (order <= 1)
>                  return;
>
>          /*
> @@ -3325,8 +3326,9 @@ void deferred_split_folio(struct folio *folio)
>
>          spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>          if (list_empty(&folio->_deferred_list)) {
> -               if (folio_test_pmd_mappable(folio))
> +               if (order >= HPAGE_PMD_ORDER)
>                          count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> +               count_mthp_stat(order, MTHP_STAT_DEFERRED_SPLIT_PAGE);
>                  list_add_tail(&folio->_deferred_list,
> &ds_queue->split_queue);
>                  ds_queue->split_queue_len++;
>   #ifdef CONFIG_MEMCG
>
> thanks,
> bang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm: add per-order mTHP split counters
  2024-04-24 19:44   ` Yang Shi
@ 2024-04-25  5:13     ` Lance Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Lance Yang @ 2024-04-25  5:13 UTC (permalink / raw
  To: Yang Shi
  Cc: Zi Yan, akpm, 21cnbao, ryan.roberts, david, baolin.wang,
	linux-kernel, linux-mm

Hey Yang,

Thanks for taking time to review!

On Thu, Apr 25, 2024 at 3:44 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Apr 24, 2024 at 6:53 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > At present, the split counters in THP statistics no longer include
> > PTE-mapped mTHP. Therefore, this commit introduces per-order mTHP split
> > counters to monitor the frequency of mTHP splits. This will assist
> > developers in better analyzing and optimizing system performance.
> >
> > /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >         split_page
> >         split_page_failed
> >         deferred_split_page
>
> The deferred_split_page counter may easily go insane with the fix from
> https://lore.kernel.org/linux-mm/20240411153232.169560-1-zi.yan@sent.com/
>
> Zi Yan,
>
> Will you submit v2 for this patch soon?

Thanks for bringing this to my attention!
I'll follow Zi Yan's patch, then submit the next version.

Thanks,
Lance

>
>
> >
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> >  include/linux/huge_mm.h |  3 +++
> >  mm/huge_memory.c        | 14 ++++++++++++--
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 56c7ea73090b..7b9c6590e1f7 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -272,6 +272,9 @@ enum mthp_stat_item {
> >         MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> >         MTHP_STAT_ANON_SWPOUT,
> >         MTHP_STAT_ANON_SWPOUT_FALLBACK,
> > +       MTHP_STAT_SPLIT_PAGE,
> > +       MTHP_STAT_SPLIT_PAGE_FAILED,
> > +       MTHP_STAT_DEFERRED_SPLIT_PAGE,
> >         __MTHP_STAT_COUNT
> >  };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 055df5aac7c3..52db888e47a6 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -557,6 +557,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> >  DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> >  DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> >  DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> > +DEFINE_MTHP_STAT_ATTR(split_page, MTHP_STAT_SPLIT_PAGE);
> > +DEFINE_MTHP_STAT_ATTR(split_page_failed, MTHP_STAT_SPLIT_PAGE_FAILED);
> > +DEFINE_MTHP_STAT_ATTR(deferred_split_page, MTHP_STAT_DEFERRED_SPLIT_PAGE);
> >
> >  static struct attribute *stats_attrs[] = {
> >         &anon_fault_alloc_attr.attr,
> > @@ -564,6 +567,9 @@ static struct attribute *stats_attrs[] = {
> >         &anon_fault_fallback_charge_attr.attr,
> >         &anon_swpout_attr.attr,
> >         &anon_swpout_fallback_attr.attr,
> > +       &split_page_attr.attr,
> > +       &split_page_failed_attr.attr,
> > +       &deferred_split_page_attr.attr,
> >         NULL,
> >  };
> >
> > @@ -3083,7 +3089,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >         XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
> >         struct anon_vma *anon_vma = NULL;
> >         struct address_space *mapping = NULL;
> > -       bool is_thp = folio_test_pmd_mappable(folio);
> > +       int order = folio_order(folio);
> >         int extra_pins, ret;
> >         pgoff_t end;
> >         bool is_hzp;
> > @@ -3262,8 +3268,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >                 i_mmap_unlock_read(mapping);
> >  out:
> >         xas_destroy(&xas);
> > -       if (is_thp)
> > +       if (order >= HPAGE_PMD_ORDER)
> >                 count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
> > +       count_mthp_stat(order, !ret ? MTHP_STAT_SPLIT_PAGE :
> > +                                     MTHP_STAT_SPLIT_PAGE_FAILED);
> >         return ret;
> >  }
> >
> > @@ -3327,6 +3335,8 @@ void deferred_split_folio(struct folio *folio)
> >         if (list_empty(&folio->_deferred_list)) {
> >                 if (folio_test_pmd_mappable(folio))
> >                         count_vm_event(THP_DEFERRED_SPLIT_PAGE);
> > +               count_mthp_stat(folio_order(folio),
> > +                               MTHP_STAT_DEFERRED_SPLIT_PAGE);
> >                 list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
> >                 ds_queue->split_queue_len++;
> >  #ifdef CONFIG_MEMCG
> > --
> > 2.33.1
> >
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] mm: add docs for per-order mTHP split counters
  2024-04-24 15:34   ` Ryan Roberts
@ 2024-04-25  5:26     ` Lance Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Lance Yang @ 2024-04-25  5:26 UTC (permalink / raw
  To: Ryan Roberts; +Cc: akpm, 21cnbao, david, baolin.wang, linux-kernel, linux-mm

Hey Ryan,

Thanks for taking time to review!

On Wed, Apr 24, 2024 at 11:34 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 24/04/2024 14:51, Lance Yang wrote:
> > This commit introduces documentation for mTHP split counters in
> > transhuge.rst.
> >
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index f82300b9193f..35d574a531c8 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -475,6 +475,22 @@ anon_swpout_fallback
> >       Usually because failed to allocate some continuous swap space
> >       for the huge page.
> >
> > +split_page
> > +     is incremented every time a huge page is split into base
>
> perhaps "...successfully split into base..." to make it clear that this is only
> incremented on success.

Yep. Your suggestion does make it clearer.

>
> > +     pages. This can happen for a variety of reasons but a common
> > +     reason is that a huge page is old and is being reclaimed.
> > +     This action implies splitting all PMD/PTE mapped with the huge page.
>
> What does it mean to "split all PTE"? It's already at its smallest granularity.
> Perhaps "This action implies splitting any block mappings into PTEs."?

Nice. It would be clearer and better!

Thanks again for the suggestions!
Lance

>
> > +
> > +split_page_failed
> > +     is incremented if kernel fails to split huge
> > +     page. This can happen if the page was pinned by somebody.
> > +
> > +deferred_split_page
> > +     is incremented when a huge page is put onto split
> > +     queue. This happens when a huge page is partially unmapped and
> > +     splitting it would free up some memory. Pages on split queue are
> > +     going to be split under memory pressure.
> > +
> >  As the system ages, allocating huge pages may be expensive as the
> >  system uses memory compaction to copy data around memory to free a
> >  huge page for use. There are some counters in ``/proc/vmstat`` to help
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-04-25  5:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-24 13:51 [PATCH 0/2] mm: introduce per-order mTHP split counters Lance Yang
2024-04-24 13:51 ` [PATCH 1/2] mm: add " Lance Yang
2024-04-24 15:41   ` Ryan Roberts
2024-04-24 17:12   ` Bang Li
2024-04-24 17:58     ` Bang Li
2024-04-25  4:47       ` Lance Yang
2024-04-24 19:44   ` Yang Shi
2024-04-25  5:13     ` Lance Yang
2024-04-24 13:51 ` [PATCH 2/2] mm: add docs for " Lance Yang
2024-04-24 15:34   ` Ryan Roberts
2024-04-25  5:26     ` Lance Yang
2024-04-24 15:00 ` [PATCH 0/2] mm: introduce " David Hildenbrand
2024-04-24 15:20   ` Ryan Roberts
2024-04-24 15:29     ` David Hildenbrand
2024-04-24 15:53       ` Lance Yang
2024-04-24 15:54     ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).