LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Enhance soft hwpoison handling and injection
@ 2024-05-01 23:24 Jane Chu
  2024-05-01 23:24 ` [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Jane Chu @ 2024-05-01 23:24 UTC (permalink / raw
  To: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

This series aim at the following enhancement -
1. Let one hwpoison injector, that is, madvise(MADV_HWPOISON) to behave
   more like as if a real UE occurred. Because the other two injectors
   such as hwpoison-inject and the 'einj' on x86 can't, and it seems to
   me we need a better simulation to real UE scenario.
2. For years, if the kernel is unable to unmap a hwpoisoned page, it send
   a SIGKILL instead of SIGBUS to prevent user process from potentially
   accessing the page again. But in doing so, the user process also lose
   important information: vaddr, for recovery.  Fortunately, the kernel
   already has code to kill process re-accessing a hwpoisoned page, so
   remove the '!unmap_success' check.
3. Right now, if a thp page under GUP longterm pin is hwpoisoned, and
   kernel cannot split the thp page, memory-failure simply ignores
   the UE and returns.  That's not ideal, it could deliver a SIGBUS with
   useful information for userspace recovery.


Jane Chu (3):
  mm/memory-failure: try to send SIGBUS even if unmap failed
  mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)
  mm/memory-failure: send SIGBUS in the event of thp split fail

 mm/madvise.c        |  2 +-
 mm/memory-failure.c | 49 ++++++++++++++++++++++++++++++++++++---------
 2 files changed, 41 insertions(+), 10 deletions(-)

-- 
2.39.3


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-01 23:24 [PATCH 0/3] Enhance soft hwpoison handling and injection Jane Chu
@ 2024-05-01 23:24 ` Jane Chu
  2024-05-07  9:02   ` Oscar Salvador
                     ` (2 more replies)
  2024-05-01 23:24 ` [PATCH 2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
  2024-05-01 23:24 ` [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
  2 siblings, 3 replies; 26+ messages in thread
From: Jane Chu @ 2024-05-01 23:24 UTC (permalink / raw
  To: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

For years when it comes down to kill a process due to hwpoison,
a SIGBUS is delivered only if unmap has been successful.
Otherwise, a SIGKILL is delivered. And the reason for that is
to prevent the involved process from accessing the hwpoisoned
page again.

Since then a lot has changed, a hwpoisoned page is marked and
upon being re-accessed, the process will be killed immediately.
So let's take out the '!unmap_success' factor and try to deliver
SIGBUS if possible.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/memory-failure.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9e62a00b46dd..7fcf182abb96 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -519,19 +519,14 @@ void add_to_kill_ksm(struct task_struct *tsk, struct page *p,
  * Also when FAIL is set do a force kill because something went
  * wrong earlier.
  */
-static void kill_procs(struct list_head *to_kill, int forcekill, bool fail,
+static void kill_procs(struct list_head *to_kill, int forcekill,
 		unsigned long pfn, int flags)
 {
 	struct to_kill *tk, *next;
 
 	list_for_each_entry_safe(tk, next, to_kill, nd) {
 		if (forcekill) {
-			/*
-			 * In case something went wrong with munmapping
-			 * make sure the process doesn't catch the
-			 * signal and then access the memory. Just kill it.
-			 */
-			if (fail || tk->addr == -EFAULT) {
+			if (tk->addr == -EFAULT) {
 				pr_err("%#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n",
 				       pfn, tk->tsk->comm, tk->tsk->pid);
 				do_send_sig_info(SIGKILL, SEND_SIG_PRIV,
@@ -1666,7 +1661,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 */
 	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL) ||
 		    !unmap_success;
-	kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
+	kill_procs(&tokill, forcekill, pfn, flags);
 
 	return unmap_success;
 }
@@ -1730,7 +1725,7 @@ static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn,
 		unmap_mapping_range(mapping, start, size, 0);
 	}
 
-	kill_procs(to_kill, flags & MF_MUST_KILL, false, pfn, flags);
+	kill_procs(to_kill, flags & MF_MUST_KILL, pfn, flags);
 }
 
 /*
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)
  2024-05-01 23:24 [PATCH 0/3] Enhance soft hwpoison handling and injection Jane Chu
  2024-05-01 23:24 ` [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
@ 2024-05-01 23:24 ` Jane Chu
  2024-05-05  7:02   ` Miaohe Lin
  2024-05-01 23:24 ` [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
  2 siblings, 1 reply; 26+ messages in thread
From: Jane Chu @ 2024-05-01 23:24 UTC (permalink / raw
  To: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
a synchrous way in a sense, the injector is also a process under
test, and should it have the poisoned page mapped in its address
space, it should legitimately get killed as much as in a real UE
situation.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 1a073fcc4c0c..eaeae5252c02 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1127,7 +1127,7 @@ static int madvise_inject_error(int behavior,
 		} else {
 			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				 pfn, start);
-			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
+			ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);
 			if (ret == -EOPNOTSUPP)
 				ret = 0;
 		}
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-01 23:24 [PATCH 0/3] Enhance soft hwpoison handling and injection Jane Chu
  2024-05-01 23:24 ` [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
  2024-05-01 23:24 ` [PATCH 2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
@ 2024-05-01 23:24 ` Jane Chu
  2024-05-05  7:00   ` Miaohe Lin
  2024-05-08  9:03   ` Miaohe Lin
  2 siblings, 2 replies; 26+ messages in thread
From: Jane Chu @ 2024-05-01 23:24 UTC (permalink / raw
  To: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

When handle hwpoison in a GUP longterm pin'ed thp page,
try_to_split_thp_page() will fail. And at this point, there is little else
the kernel could do except sending a SIGBUS to the user process, thus
give it a chance to recover.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 7fcf182abb96..67f4d24a98e7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 	return rc;
 }
 
+/*
+ * The calling condition is as such: thp split failed, page might have
+ * been GUP longterm pinned, not much can be done for recovery.
+ * But a SIGBUS should be delivered with vaddr provided so that the user
+ * application has a chance to recover. Also, application processes'
+ * election for MCE early killed will be honored.
+ */
+static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
+			struct page *hpage)
+{
+	struct folio *folio = page_folio(hpage);
+	LIST_HEAD(tokill);
+	int res = -EHWPOISON;
+
+	/* deal with user pages only */
+	if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
+		res = -EBUSY;
+	if (!(PageLRU(hpage) || PageHuge(p)))
+		res = -EBUSY;
+
+	if (res == -EHWPOISON) {
+		collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
+		kill_procs(&tokill, true, pfn, flags);
+	}
+
+	if (flags & MF_COUNT_INCREASED)
+		put_page(p);
+
+	return res;
+}
+
 /**
  * memory_failure - Handle memory failure of a page.
  * @pfn: Page Number of the corrupted page
@@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
 		 */
 		SetPageHasHWPoisoned(hpage);
 		if (try_to_split_thp_page(p) < 0) {
+			if (flags & MF_ACTION_REQUIRED) {
+				pr_err("%#lx: thp split failed\n", pfn);
+				res = kill_procs_now(p, pfn, flags, hpage);
+				goto unlock_mutex;
+			}
 			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
 			goto unlock_mutex;
 		}
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-01 23:24 ` [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
@ 2024-05-05  7:00   ` Miaohe Lin
  2024-05-06 20:26     ` Jane Chu
  2024-05-08  9:03   ` Miaohe Lin
  1 sibling, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2024-05-05  7:00 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/2 7:24, Jane Chu wrote:
> When handle hwpoison in a GUP longterm pin'ed thp page,
> try_to_split_thp_page() will fail. And at this point, there is little else
> the kernel could do except sending a SIGBUS to the user process, thus
> give it a chance to recover.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>

Thanks for your patch. Some comments below.

> ---
>  mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 7fcf182abb96..67f4d24a98e7 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>  	return rc;
>  }
>  
> +/*
> + * The calling condition is as such: thp split failed, page might have
> + * been GUP longterm pinned, not much can be done for recovery.
> + * But a SIGBUS should be delivered with vaddr provided so that the user
> + * application has a chance to recover. Also, application processes'
> + * election for MCE early killed will be honored.
> + */
> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
> +			struct page *hpage)
> +{
> +	struct folio *folio = page_folio(hpage);
> +	LIST_HEAD(tokill);
> +	int res = -EHWPOISON;
> +
> +	/* deal with user pages only */
> +	if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
> +		res = -EBUSY;
> +	if (!(PageLRU(hpage) || PageHuge(p)))
> +		res = -EBUSY;

Above checks seems unneeded. We already know it's thp?

> +
> +	if (res == -EHWPOISON) {
> +		collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
> +		kill_procs(&tokill, true, pfn, flags);
> +	}
> +
> +	if (flags & MF_COUNT_INCREASED)
> +		put_page(p);

This if block is broken. put_page() has been done when try_to_split_thp_page() fails?

> +

action_result is missing?

> +	return res;
> +}
> +
>  /**
>   * memory_failure - Handle memory failure of a page.
>   * @pfn: Page Number of the corrupted page
> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>  		 */
>  		SetPageHasHWPoisoned(hpage);
>  		if (try_to_split_thp_page(p) < 0) {

Should hwpoison_filter() be called in this case?

> +			if (flags & MF_ACTION_REQUIRED) {
> +				pr_err("%#lx: thp split failed\n", pfn);
> +				res = kill_procs_now(p, pfn, flags, hpage);

Can we use hwpoison_user_mappings() directly here?

Thanks.
.

> +				goto unlock_mutex;
> +			}
>  			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>  			goto unlock_mutex;
>  		}
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)
  2024-05-01 23:24 ` [PATCH 2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
@ 2024-05-05  7:02   ` Miaohe Lin
  2024-05-06 19:54     ` Jane Chu
  0 siblings, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2024-05-05  7:02 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/2 7:24, Jane Chu wrote:
> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
> a synchrous way in a sense, the injector is also a process under
> test, and should it have the poisoned page mapped in its address
> space, it should legitimately get killed as much as in a real UE
> situation.

Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison?
Thanks.
.

> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 1a073fcc4c0c..eaeae5252c02 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1127,7 +1127,7 @@ static int madvise_inject_error(int behavior,
>  		} else {
>  			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
>  				 pfn, start);
> -			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
> +			ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);
>  			if (ret == -EOPNOTSUPP)
>  				ret = 0;
>  		}
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)
  2024-05-05  7:02   ` Miaohe Lin
@ 2024-05-06 19:54     ` Jane Chu
  2024-05-08  7:58       ` Miaohe Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Jane Chu @ 2024-05-06 19:54 UTC (permalink / raw
  To: Miaohe Lin, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/5/2024 12:02 AM, Miaohe Lin wrote:

> On 2024/5/2 7:24, Jane Chu wrote:
>> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
>> a synchrous way in a sense, the injector is also a process under
>> test, and should it have the poisoned page mapped in its address
>> space, it should legitimately get killed as much as in a real UE
>> situation.
> Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison?
> Thanks.

So the first question is: Is there a need to preserve the existing 
behavior of  madvise(MADV_HWPOISON)?

The madvise(2) man page says -

        *MADV_HWPOISON *(since Linux 2.6.32)
               Poison the pages in the range specified by/addr/  and/length/
               and handle subsequent references to those pages like a
               hardware memory corruption.  This operation is available
               only for privileged (*CAP_SYS_ADMIN*) processes.  This
               operation may result in the calling process receiving a
               *SIGBUS *and the page being unmapped.

               This feature is intended for testing of memory error-
               handling code; it is available only if the kernel was
               configured with*CONFIG_MEMORY_FAILURE*.

And the impression from my reading is that: there doesn't seem to be a need.

A couple observations -
- The man page states that the calling process may receive a SIGBUS and the page being unmapped.
But the existing behavior is no SIGBUS unless MCE early kill is elected, so it doesn't quite match
the man page.
- There is 'hwpoison-inject' which behaves similar to the existing madvise(MADV_HWPOISON), that is,
soft inject without MF_ACTION_REQUIRED flag.

thanks,
-jane

> .
>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   mm/madvise.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 1a073fcc4c0c..eaeae5252c02 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -1127,7 +1127,7 @@ static int madvise_inject_error(int behavior,
>>   		} else {
>>   			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
>>   				 pfn, start);
>> -			ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
>> +			ret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);
>>   			if (ret == -EOPNOTSUPP)
>>   				ret = 0;
>>   		}
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-05  7:00   ` Miaohe Lin
@ 2024-05-06 20:26     ` Jane Chu
  2024-05-08  8:08       ` Miaohe Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Jane Chu @ 2024-05-06 20:26 UTC (permalink / raw
  To: Miaohe Lin, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/5/2024 12:00 AM, Miaohe Lin wrote:

> On 2024/5/2 7:24, Jane Chu wrote:
>> When handle hwpoison in a GUP longterm pin'ed thp page,
>> try_to_split_thp_page() will fail. And at this point, there is little else
>> the kernel could do except sending a SIGBUS to the user process, thus
>> give it a chance to recover.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> Thanks for your patch. Some comments below.
>
>> ---
>>   mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 36 insertions(+)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 7fcf182abb96..67f4d24a98e7 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>   	return rc;
>>   }
>>   
>> +/*
>> + * The calling condition is as such: thp split failed, page might have
>> + * been GUP longterm pinned, not much can be done for recovery.
>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>> + * application has a chance to recover. Also, application processes'
>> + * election for MCE early killed will be honored.
>> + */
>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>> +			struct page *hpage)
>> +{
>> +	struct folio *folio = page_folio(hpage);
>> +	LIST_HEAD(tokill);
>> +	int res = -EHWPOISON;
>> +
>> +	/* deal with user pages only */
>> +	if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>> +		res = -EBUSY;
>> +	if (!(PageLRU(hpage) || PageHuge(p)))
>> +		res = -EBUSY;
> Above checks seems unneeded. We already know it's thp?

Agreed.

I  lifted these checks from hwpoison_user_mapping() with a hope to make 
kill_procs_now() more generic,

such as, potentially replacing kill_accessing_processes() for 
re-accessing hwpoisoned page.

But I backed out at last, due to concerns that my tests might not have 
covered sufficient number of scenarios.

>
>> +
>> +	if (res == -EHWPOISON) {
>> +		collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>> +		kill_procs(&tokill, true, pfn, flags);
>> +	}
>> +
>> +	if (flags & MF_COUNT_INCREASED)
>> +		put_page(p);
> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?

put_page() has not been done if try_to_split_thp_page() fails, and I 
think it should.

I will revise the code so that put_page() is called regardless 
MF_ACTION_REQUIRED is set or not.

>
>> +
> action_result is missing?

Indeed,  action_result() isn't always called, referring to the 
re-accessing hwpoison scenarios.

In this case, I think the reason  is that, we just killed the process 
and there is nothing

else to do or to report.

>
>> +	return res;
>> +}
>> +
>>   /**
>>    * memory_failure - Handle memory failure of a page.
>>    * @pfn: Page Number of the corrupted page
>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>   		 */
>>   		SetPageHasHWPoisoned(hpage);
>>   		if (try_to_split_thp_page(p) < 0) {
> Should hwpoison_filter() be called in this case?
Yes, it should. I will add the hwpoison_filter check.
>
>> +			if (flags & MF_ACTION_REQUIRED) {
>> +				pr_err("%#lx: thp split failed\n", pfn);
>> +				res = kill_procs_now(p, pfn, flags, hpage);
> Can we use hwpoison_user_mappings() directly here?

I thought about using hwpoison_user_mappings() with an extra flag, but 
gave up in the end.

Because most of the code there are not needed, such as the checks you 
mentioned above,

and umapping etc.

thanks!

-jane

>
> Thanks.
> .
>
>> +				goto unlock_mutex;
>> +			}
>>   			res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>>   			goto unlock_mutex;
>>   		}
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-01 23:24 ` [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
@ 2024-05-07  9:02   ` Oscar Salvador
  2024-05-07 17:54     ` Jane Chu
  2024-05-08  7:47   ` Miaohe Lin
  2024-05-09  2:54   ` Miaohe Lin
  2 siblings, 1 reply; 26+ messages in thread
From: Oscar Salvador @ 2024-05-07  9:02 UTC (permalink / raw
  To: Jane Chu; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On Wed, May 01, 2024 at 05:24:56PM -0600, Jane Chu wrote:
> For years when it comes down to kill a process due to hwpoison,
> a SIGBUS is delivered only if unmap has been successful.
> Otherwise, a SIGKILL is delivered. And the reason for that is
> to prevent the involved process from accessing the hwpoisoned
> page again.
> 
> Since then a lot has changed, a hwpoisoned page is marked and
> upon being re-accessed, the process will be killed immediately.
> So let's take out the '!unmap_success' factor and try to deliver
> SIGBUS if possible.

I am missing some details here.
An unmapped hwpoison page will trigger a fault and will return
VM_FAULT_HWPOISON all the way down and then deliver SIGBUS,
but if the page was not unmapped, how will this be catch upon
re-accessing? Will the system deliver a MCE event? 


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-07  9:02   ` Oscar Salvador
@ 2024-05-07 17:54     ` Jane Chu
  2024-05-08 12:06       ` Oscar Salvador
  0 siblings, 1 reply; 26+ messages in thread
From: Jane Chu @ 2024-05-07 17:54 UTC (permalink / raw
  To: Oscar Salvador; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/7/2024 2:02 AM, Oscar Salvador wrote:

> On Wed, May 01, 2024 at 05:24:56PM -0600, Jane Chu wrote:
>> For years when it comes down to kill a process due to hwpoison,
>> a SIGBUS is delivered only if unmap has been successful.
>> Otherwise, a SIGKILL is delivered. And the reason for that is
>> to prevent the involved process from accessing the hwpoisoned
>> page again.
>>
>> Since then a lot has changed, a hwpoisoned page is marked and
>> upon being re-accessed, the process will be killed immediately.
>> So let's take out the '!unmap_success' factor and try to deliver
>> SIGBUS if possible.
> I am missing some details here.
> An unmapped hwpoison page will trigger a fault and will return
> VM_FAULT_HWPOISON all the way down and then deliver SIGBUS,
> but if the page was not unmapped, how will this be catch upon
> re-accessing? Will the system deliver a MCE event?
>
I actually managed to hit the re-access case with an older version of 
Linux -

MCE occurred, but unmap failed,  no SIGBUS and test process re-access

the same address over and over (hence MCE after MCE), as the CPU

was unable to make forward progress.   In reality, this issue is fixed with

kill_accessing_processes().  The comment for this patch refers to 
comment made

about '!unmap_access' long time ago.

thanks,

-jane



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-01 23:24 ` [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
  2024-05-07  9:02   ` Oscar Salvador
@ 2024-05-08  7:47   ` Miaohe Lin
  2024-05-08 16:58     ` Jane Chu
  2024-05-09  2:54   ` Miaohe Lin
  2 siblings, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2024-05-08  7:47 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/2 7:24, Jane Chu wrote:
> For years when it comes down to kill a process due to hwpoison,
> a SIGBUS is delivered only if unmap has been successful.
> Otherwise, a SIGKILL is delivered. And the reason for that is
> to prevent the involved process from accessing the hwpoisoned
> page again.
> 
> Since then a lot has changed, a hwpoisoned page is marked and
> upon being re-accessed, the process will be killed immediately.
> So let's take out the '!unmap_success' factor and try to deliver
> SIGBUS if possible.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  mm/memory-failure.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 9e62a00b46dd..7fcf182abb96 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -519,19 +519,14 @@ void add_to_kill_ksm(struct task_struct *tsk, struct page *p,
>   * Also when FAIL is set do a force kill because something went
>   * wrong earlier.

Since @fail is removed, above comment should be removed too.
Thanks.
.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON)
  2024-05-06 19:54     ` Jane Chu
@ 2024-05-08  7:58       ` Miaohe Lin
  0 siblings, 0 replies; 26+ messages in thread
From: Miaohe Lin @ 2024-05-08  7:58 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/7 3:54, Jane Chu wrote:
> On 5/5/2024 12:02 AM, Miaohe Lin wrote:
> 
>> On 2024/5/2 7:24, Jane Chu wrote:
>>> The soft hwpoison injector via madvise(MADV_HWPOISON) operates in
>>> a synchrous way in a sense, the injector is also a process under
>>> test, and should it have the poisoned page mapped in its address
>>> space, it should legitimately get killed as much as in a real UE
>>> situation.
>> Will it be better to add a method to set MF_ACTION_REQUIRED explicitly when inject soft hwpoison?
>> Thanks.
> 
> So the first question is: Is there a need to preserve the existing behavior of  madvise(MADV_HWPOISON)?
> 
> The madvise(2) man page says -
> 
>        *MADV_HWPOISON *(since Linux 2.6.32)
>               Poison the pages in the range specified by/addr/  and/length/
>               and handle subsequent references to those pages like a
>               hardware memory corruption.  This operation is available
>               only for privileged (*CAP_SYS_ADMIN*) processes.  This
>               operation may result in the calling process receiving a
>               *SIGBUS *and the page being unmapped.
> 
>               This feature is intended for testing of memory error-
>               handling code; it is available only if the kernel was
>               configured with*CONFIG_MEMORY_FAILURE*.
> 
> And the impression from my reading is that: there doesn't seem to be a need.
> 
> A couple observations -
> - The man page states that the calling process may receive a SIGBUS and the page being unmapped.
> But the existing behavior is no SIGBUS unless MCE early kill is elected, so it doesn't quite match
> the man page.
> - There is 'hwpoison-inject' which behaves similar to the existing madvise(MADV_HWPOISON), that is,
> soft inject without MF_ACTION_REQUIRED flag.
> 

I tend to agree with you. It might be a good idea to add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON).
Thanks.
.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-06 20:26     ` Jane Chu
@ 2024-05-08  8:08       ` Miaohe Lin
  2024-05-08 17:45         ` Jane Chu
  0 siblings, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2024-05-08  8:08 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/7 4:26, Jane Chu wrote:
> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
> 
>> On 2024/5/2 7:24, Jane Chu wrote:
>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>> the kernel could do except sending a SIGBUS to the user process, thus
>>> give it a chance to recover.
>>>
>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> Thanks for your patch. Some comments below.
>>
>>> ---
>>>   mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 36 insertions(+)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 7fcf182abb96..67f4d24a98e7 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>       return rc;
>>>   }
>>>   +/*
>>> + * The calling condition is as such: thp split failed, page might have
>>> + * been GUP longterm pinned, not much can be done for recovery.
>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>> + * application has a chance to recover. Also, application processes'
>>> + * election for MCE early killed will be honored.
>>> + */
>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>> +            struct page *hpage)
>>> +{
>>> +    struct folio *folio = page_folio(hpage);
>>> +    LIST_HEAD(tokill);
>>> +    int res = -EHWPOISON;
>>> +
>>> +    /* deal with user pages only */
>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>> +        res = -EBUSY;
>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>> +        res = -EBUSY;
>> Above checks seems unneeded. We already know it's thp?
> 
> Agreed.
> 
> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
> 
> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
> 
> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
> 
>>
>>> +
>>> +    if (res == -EHWPOISON) {
>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>> +        kill_procs(&tokill, true, pfn, flags);
>>> +    }
>>> +
>>> +    if (flags & MF_COUNT_INCREASED)
>>> +        put_page(p);
>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
> 
> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.

In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:

static int try_to_split_thp_page(struct page *page)
{
	int ret;

	lock_page(page);
	ret = split_huge_page(page);
	unlock_page(page);

	if (unlikely(ret))
		put_page(page);
	^^^^^^^^^^^^^^^^^^^^^^^
	return ret;
}

Or am I miss something?

> 
> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not.
> 
>>
>>> +
>> action_result is missing?
> 
> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios.
> 
> In this case, I think the reason  is that, we just killed the process and there is nothing
> 
> else to do or to report.
> 
>>
>>> +    return res;
>>> +}
>>> +
>>>   /**
>>>    * memory_failure - Handle memory failure of a page.
>>>    * @pfn: Page Number of the corrupted page
>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>            */
>>>           SetPageHasHWPoisoned(hpage);
>>>           if (try_to_split_thp_page(p) < 0) {
>> Should hwpoison_filter() be called in this case?
> Yes, it should. I will add the hwpoison_filter check.
>>
>>> +            if (flags & MF_ACTION_REQUIRED) {

Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it?

Thanks.
.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-01 23:24 ` [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
  2024-05-05  7:00   ` Miaohe Lin
@ 2024-05-08  9:03   ` Miaohe Lin
  2024-05-08 16:56     ` Jane Chu
  1 sibling, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2024-05-08  9:03 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/2 7:24, Jane Chu wrote:
> When handle hwpoison in a GUP longterm pin'ed thp page,
> try_to_split_thp_page() will fail. And at this point, there is little else
> the kernel could do except sending a SIGBUS to the user process, thus
> give it a chance to recover.

It seems the user process will still receive SIGBUS via kill_accessing_process()
when (re-)access thp later. So they should have a chance to recover already.
Or am I miss something?

Thanks.
.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-07 17:54     ` Jane Chu
@ 2024-05-08 12:06       ` Oscar Salvador
  2024-05-08 16:51         ` Jane Chu
  0 siblings, 1 reply; 26+ messages in thread
From: Oscar Salvador @ 2024-05-08 12:06 UTC (permalink / raw
  To: Jane Chu; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On Tue, May 07, 2024 at 10:54:10AM -0700, Jane Chu wrote:
> I actually managed to hit the re-access case with an older version of Linux
> -
> 
> MCE occurred, but unmap failed,  no SIGBUS and test process re-access
> 
> the same address over and over (hence MCE after MCE), as the CPU
> 
> was unable to make forward progress.   In reality, this issue is fixed with
> 
> kill_accessing_processes().  The comment for this patch refers to comment
> made

So we get a faulty page and we try to unmap it from all processes that
might have it mapped in their pgtables.
Prior to this patch we would kill the processes right away and now we
deliver a SIGBUS.

Seems safe as upon-reaccesing kill_accessing_process() will be called
for already hwpoisoned pages.

I think the changelog could be made more explicit about this scenario
and state the role of kill_accessing_process more clear.

With that: Reviewed-by: Oscar Salvador <osalvador@suse.de>
 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-08 12:06       ` Oscar Salvador
@ 2024-05-08 16:51         ` Jane Chu
  0 siblings, 0 replies; 26+ messages in thread
From: Jane Chu @ 2024-05-08 16:51 UTC (permalink / raw
  To: Oscar Salvador; +Cc: linmiaohe, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/8/2024 5:06 AM, Oscar Salvador wrote:

> On Tue, May 07, 2024 at 10:54:10AM -0700, Jane Chu wrote:
>> I actually managed to hit the re-access case with an older version of Linux
>> -
>>
>> MCE occurred, but unmap failed,  no SIGBUS and test process re-access
>>
>> the same address over and over (hence MCE after MCE), as the CPU
>>
>> was unable to make forward progress.   In reality, this issue is fixed with
>>
>> kill_accessing_processes().  The comment for this patch refers to comment
>> made
> So we get a faulty page and we try to unmap it from all processes that
> might have it mapped in their pgtables.
> Prior to this patch we would kill the processes right away and now we
> deliver a SIGBUS.
>
> Seems safe as upon-reaccesing kill_accessing_process() will be called
> for already hwpoisoned pages.
>
> I think the changelog could be made more explicit about this scenario
> and state the role of kill_accessing_process more clear.
>
> With that: Reviewed-by: Oscar Salvador <osalvador@suse.de>
>   

I will revise the changelog and mention kill_accessing_process().

Thanks!

-jane

>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-08  9:03   ` Miaohe Lin
@ 2024-05-08 16:56     ` Jane Chu
  2024-05-09  8:52       ` Miaohe Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Jane Chu @ 2024-05-08 16:56 UTC (permalink / raw
  To: Miaohe Lin, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/8/2024 2:03 AM, Miaohe Lin wrote:

> On 2024/5/2 7:24, Jane Chu wrote:
>> When handle hwpoison in a GUP longterm pin'ed thp page,
>> try_to_split_thp_page() will fail. And at this point, there is little else
>> the kernel could do except sending a SIGBUS to the user process, thus
>> give it a chance to recover.
> It seems the user process will still receive SIGBUS via kill_accessing_process()
> when (re-)access thp later. So they should have a chance to recover already.
> Or am I miss something?

The concern is about real UE consumption in which case, it's desirable 
to kill the process ASAP without having to relying on subsequent 
access.  Also to honor processes' MCE-early-kill request. 
kill_accessing_process() is very conservative in that, it doesn't check 
other processes that have the poisoned page mapped.

thanks,

-jane

>
> Thanks.
> .
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-08  7:47   ` Miaohe Lin
@ 2024-05-08 16:58     ` Jane Chu
  0 siblings, 0 replies; 26+ messages in thread
From: Jane Chu @ 2024-05-08 16:58 UTC (permalink / raw
  To: Miaohe Lin, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/8/2024 12:47 AM, Miaohe Lin wrote:

> On 2024/5/2 7:24, Jane Chu wrote:
>> For years when it comes down to kill a process due to hwpoison,
>> a SIGBUS is delivered only if unmap has been successful.
>> Otherwise, a SIGKILL is delivered. And the reason for that is
>> to prevent the involved process from accessing the hwpoisoned
>> page again.
>>
>> Since then a lot has changed, a hwpoisoned page is marked and
>> upon being re-accessed, the process will be killed immediately.
>> So let's take out the '!unmap_success' factor and try to deliver
>> SIGBUS if possible.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   mm/memory-failure.c | 13 ++++---------
>>   1 file changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 9e62a00b46dd..7fcf182abb96 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -519,19 +519,14 @@ void add_to_kill_ksm(struct task_struct *tsk, struct page *p,
>>    * Also when FAIL is set do a force kill because something went
>>    * wrong earlier.
> Since @fail is removed, above comment should be removed too.
> Thanks.
> .

Good catch, will do.

Thanks!

-jane


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-08  8:08       ` Miaohe Lin
@ 2024-05-08 17:45         ` Jane Chu
  2024-05-09  8:30           ` Miaohe Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Jane Chu @ 2024-05-08 17:45 UTC (permalink / raw
  To: Miaohe Lin, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/8/2024 1:08 AM, Miaohe Lin wrote:

> On 2024/5/7 4:26, Jane Chu wrote:
>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>
>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>> give it a chance to recover.
>>>>
>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>> Thanks for your patch. Some comments below.
>>>
>>>> ---
>>>>    mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 36 insertions(+)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>        return rc;
>>>>    }
>>>>    +/*
>>>> + * The calling condition is as such: thp split failed, page might have
>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>> + * application has a chance to recover. Also, application processes'
>>>> + * election for MCE early killed will be honored.
>>>> + */
>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>> +            struct page *hpage)
>>>> +{
>>>> +    struct folio *folio = page_folio(hpage);
>>>> +    LIST_HEAD(tokill);
>>>> +    int res = -EHWPOISON;
>>>> +
>>>> +    /* deal with user pages only */
>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>> +        res = -EBUSY;
>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>> +        res = -EBUSY;
>>> Above checks seems unneeded. We already know it's thp?
>> Agreed.
>>
>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>
>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>
>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>
>>>> +
>>>> +    if (res == -EHWPOISON) {
>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>> +    }
>>>> +
>>>> +    if (flags & MF_COUNT_INCREASED)
>>>> +        put_page(p);
>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>
> static int try_to_split_thp_page(struct page *page)
> {
> 	int ret;
>
> 	lock_page(page);
> 	ret = split_huge_page(page);
> 	unlock_page(page);
>
> 	if (unlikely(ret))
> 		put_page(page);
> 	^^^^^^^^^^^^^^^^^^^^^^^
> 	return ret;
> }
>
> Or am I miss something?

I think you caught a bug in my code, thanks!

How about moving put_page() outside try_to_split_thp_page() ?

>
>> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not.
>>
>>>> +
>>> action_result is missing?
>> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios.
>>
>> In this case, I think the reason  is that, we just killed the process and there is nothing
>>
>> else to do or to report.
>>
>>>> +    return res;
>>>> +}
>>>> +
>>>>    /**
>>>>     * memory_failure - Handle memory failure of a page.
>>>>     * @pfn: Page Number of the corrupted page
>>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>>             */
>>>>            SetPageHasHWPoisoned(hpage);
>>>>            if (try_to_split_thp_page(p) < 0) {
>>> Should hwpoison_filter() be called in this case?
>> Yes, it should. I will add the hwpoison_filter check.
>>>> +            if (flags & MF_ACTION_REQUIRED) {
> Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it?

I took a clue from kill_accessing_process() which is invoked only if 
MF_ACTION_REQUIRED is set.

The usual code path for delivery signal is

if page-is-dirty or MF_MUST_KILL-is-set or umap-failed, then

- send SIGKILL if vaddr is -EFAULT

- send SIGBUS with BUS_MCEERR_AR if MF_ACTION_REQUIRED is set

- send SIGBUS with BUS_MCEERR_AO if MF_ACTION_REQUIRED is not set and 
process elected for MCE-early-kill

So, if kill_procs_now() is invoked only if MF_ACTION_REQUIRED (as it is 
in the patch), one can argue that

the MCE-early-kill request is not honored which deviates from the 
existing behavior.

Perhaps I should remove the

+ if (flags & MF_ACTION_REQUIRED) {

check.

thanks!

-jane



>
> Thanks.
> .

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-01 23:24 ` [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
  2024-05-07  9:02   ` Oscar Salvador
  2024-05-08  7:47   ` Miaohe Lin
@ 2024-05-09  2:54   ` Miaohe Lin
  2024-05-09 16:40     ` Jane Chu
  2 siblings, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2024-05-09  2:54 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/2 7:24, Jane Chu wrote:
> For years when it comes down to kill a process due to hwpoison,
> a SIGBUS is delivered only if unmap has been successful.
> Otherwise, a SIGKILL is delivered. And the reason for that is
> to prevent the involved process from accessing the hwpoisoned
> page again.
> 
> Since then a lot has changed, a hwpoisoned page is marked and
> upon being re-accessed, the process will be killed immediately.
> So let's take out the '!unmap_success' factor and try to deliver
> SIGBUS if possible.
> 
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
>  mm/memory-failure.c | 13 ++++---------
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 9e62a00b46dd..7fcf182abb96 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -519,19 +519,14 @@ void add_to_kill_ksm(struct task_struct *tsk, struct page *p,
>   * Also when FAIL is set do a force kill because something went
>   * wrong earlier.
>   */
> -static void kill_procs(struct list_head *to_kill, int forcekill, bool fail,
> +static void kill_procs(struct list_head *to_kill, int forcekill,
>  		unsigned long pfn, int flags)
>  {
>  	struct to_kill *tk, *next;
>  
>  	list_for_each_entry_safe(tk, next, to_kill, nd) {
>  		if (forcekill) {
> -			/*
> -			 * In case something went wrong with munmapping
> -			 * make sure the process doesn't catch the
> -			 * signal and then access the memory. Just kill it.
> -			 */
> -			if (fail || tk->addr == -EFAULT) {
> +			if (tk->addr == -EFAULT) {
>  				pr_err("%#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n",
>  				       pfn, tk->tsk->comm, tk->tsk->pid);
>  				do_send_sig_info(SIGKILL, SEND_SIG_PRIV,
> @@ -1666,7 +1661,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  	 */

There is comment above the forcekill saying:

    When there was a problem unmapping earlier use a more force-full
uncatchable kill to prevent any accesses to the poisoned memory.

This might need to be changed too.
Thanks.
.

>  	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL) ||
>  		    !unmap_success;
> -	kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
> +	kill_procs(&tokill, forcekill, pfn, flags);
>  
>  	return unmap_success;
>  }
> @@ -1730,7 +1725,7 @@ static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn,
>  		unmap_mapping_range(mapping, start, size, 0);
>  	}
>  
> -	kill_procs(to_kill, flags & MF_MUST_KILL, false, pfn, flags);
> +	kill_procs(to_kill, flags & MF_MUST_KILL, pfn, flags);
>  }
>  
>  /*
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-08 17:45         ` Jane Chu
@ 2024-05-09  8:30           ` Miaohe Lin
  2024-05-09 15:34             ` Jane Chu
  0 siblings, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2024-05-09  8:30 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/9 1:45, Jane Chu wrote:
> On 5/8/2024 1:08 AM, Miaohe Lin wrote:
> 
>> On 2024/5/7 4:26, Jane Chu wrote:
>>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>>
>>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>>> give it a chance to recover.
>>>>>
>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>>> Thanks for your patch. Some comments below.
>>>>
>>>>> ---
>>>>>    mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>>    1 file changed, 36 insertions(+)
>>>>>
>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>>> --- a/mm/memory-failure.c
>>>>> +++ b/mm/memory-failure.c
>>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>>        return rc;
>>>>>    }
>>>>>    +/*
>>>>> + * The calling condition is as such: thp split failed, page might have
>>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>>> + * application has a chance to recover. Also, application processes'
>>>>> + * election for MCE early killed will be honored.
>>>>> + */
>>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>>> +            struct page *hpage)
>>>>> +{
>>>>> +    struct folio *folio = page_folio(hpage);
>>>>> +    LIST_HEAD(tokill);
>>>>> +    int res = -EHWPOISON;
>>>>> +
>>>>> +    /* deal with user pages only */
>>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>>> +        res = -EBUSY;
>>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>>> +        res = -EBUSY;
>>>> Above checks seems unneeded. We already know it's thp?
>>> Agreed.
>>>
>>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>>
>>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>>
>>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>>
>>>>> +
>>>>> +    if (res == -EHWPOISON) {
>>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>>> +    }
>>>>> +
>>>>> +    if (flags & MF_COUNT_INCREASED)
>>>>> +        put_page(p);
>>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
>> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>>
>> static int try_to_split_thp_page(struct page *page)
>> {
>>     int ret;
>>
>>     lock_page(page);
>>     ret = split_huge_page(page);
>>     unlock_page(page);
>>
>>     if (unlikely(ret))
>>         put_page(page);
>>     ^^^^^^^^^^^^^^^^^^^^^^^
>>     return ret;
>> }
>>
>> Or am I miss something?
> 
> I think you caught a bug in my code, thanks!
> 
> How about moving put_page() outside try_to_split_thp_page() ?

If you want to send SIGBUS in the event of thp split fail, it might be required to do so.
I think kill_procs_now() needs extra thp refcnt to do its work.

> 
>>
>>> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not.
>>>
>>>>> +
>>>> action_result is missing?
>>> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios.
>>>
>>> In this case, I think the reason  is that, we just killed the process and there is nothing
>>>
>>> else to do or to report.
>>>
>>>>> +    return res;
>>>>> +}
>>>>> +
>>>>>    /**
>>>>>     * memory_failure - Handle memory failure of a page.
>>>>>     * @pfn: Page Number of the corrupted page
>>>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>             */
>>>>>            SetPageHasHWPoisoned(hpage);
>>>>>            if (try_to_split_thp_page(p) < 0) {
>>>> Should hwpoison_filter() be called in this case?
>>> Yes, it should. I will add the hwpoison_filter check.
>>>>> +            if (flags & MF_ACTION_REQUIRED) {
>> Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it?
> 
> I took a clue from kill_accessing_process() which is invoked only if MF_ACTION_REQUIRED is set.
> 
> The usual code path for delivery signal is
> 
> if page-is-dirty or MF_MUST_KILL-is-set or umap-failed, then
> 
> - send SIGKILL if vaddr is -EFAULT
> 
> - send SIGBUS with BUS_MCEERR_AR if MF_ACTION_REQUIRED is set
> 
> - send SIGBUS with BUS_MCEERR_AO if MF_ACTION_REQUIRED is not set and process elected for MCE-early-kill
> 
> So, if kill_procs_now() is invoked only if MF_ACTION_REQUIRED (as it is in the patch), one can argue that
> 
> the MCE-early-kill request is not honored which deviates from the existing behavior.
> 
> Perhaps I should remove the
> 
> + if (flags & MF_ACTION_REQUIRED) {

I tend to agree MCE-early-kill request should be honored when try to kill process.
Thanks.
.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-08 16:56     ` Jane Chu
@ 2024-05-09  8:52       ` Miaohe Lin
  0 siblings, 0 replies; 26+ messages in thread
From: Miaohe Lin @ 2024-05-09  8:52 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/9 0:56, Jane Chu wrote:
> On 5/8/2024 2:03 AM, Miaohe Lin wrote:
> 
>> On 2024/5/2 7:24, Jane Chu wrote:
>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>> the kernel could do except sending a SIGBUS to the user process, thus
>>> give it a chance to recover.
>> It seems the user process will still receive SIGBUS via kill_accessing_process()
>> when (re-)access thp later. So they should have a chance to recover already.
>> Or am I miss something?
> 
> The concern is about real UE consumption in which case, it's desirable to kill the process ASAP without having to relying on subsequent access.  Also to honor processes' MCE-early-kill request. kill_accessing_process() is very conservative in that, it doesn't check other processes that have the poisoned page mapped.

I see. Thanks for your explanation.
Thanks.
.

> 
> thanks,
> 
> -jane
> 
>>
>> Thanks.
>> .
>>
>>
> .


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-09  8:30           ` Miaohe Lin
@ 2024-05-09 15:34             ` Jane Chu
  2024-05-10  2:59               ` Miaohe Lin
  0 siblings, 1 reply; 26+ messages in thread
From: Jane Chu @ 2024-05-09 15:34 UTC (permalink / raw
  To: Miaohe Lin, nao.horiguchi, akpm, linux-mm, linux-kernel


On 5/9/2024 1:30 AM, Miaohe Lin wrote:
> On 2024/5/9 1:45, Jane Chu wrote:
>> On 5/8/2024 1:08 AM, Miaohe Lin wrote:
>>
>>> On 2024/5/7 4:26, Jane Chu wrote:
>>>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>>>
>>>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>>>> give it a chance to recover.
>>>>>>
>>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>>>> Thanks for your patch. Some comments below.
>>>>>
>>>>>> ---
>>>>>>     mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 36 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>>>> --- a/mm/memory-failure.c
>>>>>> +++ b/mm/memory-failure.c
>>>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>>>         return rc;
>>>>>>     }
>>>>>>     +/*
>>>>>> + * The calling condition is as such: thp split failed, page might have
>>>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>>>> + * application has a chance to recover. Also, application processes'
>>>>>> + * election for MCE early killed will be honored.
>>>>>> + */
>>>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>>>> +            struct page *hpage)
>>>>>> +{
>>>>>> +    struct folio *folio = page_folio(hpage);
>>>>>> +    LIST_HEAD(tokill);
>>>>>> +    int res = -EHWPOISON;
>>>>>> +
>>>>>> +    /* deal with user pages only */
>>>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>>>> +        res = -EBUSY;
>>>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>>>> +        res = -EBUSY;
>>>>> Above checks seems unneeded. We already know it's thp?
>>>> Agreed.
>>>>
>>>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>>>
>>>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>>>
>>>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>>>
>>>>>> +
>>>>>> +    if (res == -EHWPOISON) {
>>>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>>>> +    }
>>>>>> +
>>>>>> +    if (flags & MF_COUNT_INCREASED)
>>>>>> +        put_page(p);
>>>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>>>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
>>> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>>>
>>> static int try_to_split_thp_page(struct page *page)
>>> {
>>>      int ret;
>>>
>>>      lock_page(page);
>>>      ret = split_huge_page(page);
>>>      unlock_page(page);
>>>
>>>      if (unlikely(ret))
>>>          put_page(page);
>>>      ^^^^^^^^^^^^^^^^^^^^^^^
>>>      return ret;
>>> }
>>>
>>> Or am I miss something?
>> I think you caught a bug in my code, thanks!
>>
>> How about moving put_page() outside try_to_split_thp_page() ?
> If you want to send SIGBUS in the event of thp split fail, it might be required to do so.
> I think kill_procs_now() needs extra thp refcnt to do its work.

Agreed.  I added an boolean to try_to_split_thp_page(),the boolean 
indicates whether to put_page().

In case of kill_procs_now(), put_page() is called afterwards.

>
>>>> I will revise the code so that put_page() is called regardless MF_ACTION_REQUIRED is set or not.
>>>>
>>>>>> +
>>>>> action_result is missing?
>>>> Indeed,  action_result() isn't always called, referring to the re-accessing hwpoison scenarios.
>>>>
>>>> In this case, I think the reason  is that, we just killed the process and there is nothing
>>>>
>>>> else to do or to report.
>>>>
>>>>>> +    return res;
>>>>>> +}
>>>>>> +
>>>>>>     /**
>>>>>>      * memory_failure - Handle memory failure of a page.
>>>>>>      * @pfn: Page Number of the corrupted page
>>>>>> @@ -2297,6 +2328,11 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>              */
>>>>>>             SetPageHasHWPoisoned(hpage);
>>>>>>             if (try_to_split_thp_page(p) < 0) {
>>>>> Should hwpoison_filter() be called in this case?
>>>> Yes, it should. I will add the hwpoison_filter check.
>>>>>> +            if (flags & MF_ACTION_REQUIRED) {
>>> Only in MF_ACTION_REQUIRED case, SIGBUS is sent to processes when thp split failed. Any reson under it?
>> I took a clue from kill_accessing_process() which is invoked only if MF_ACTION_REQUIRED is set.
>>
>> The usual code path for delivery signal is
>>
>> if page-is-dirty or MF_MUST_KILL-is-set or umap-failed, then
>>
>> - send SIGKILL if vaddr is -EFAULT
>>
>> - send SIGBUS with BUS_MCEERR_AR if MF_ACTION_REQUIRED is set
>>
>> - send SIGBUS with BUS_MCEERR_AO if MF_ACTION_REQUIRED is not set and process elected for MCE-early-kill
>>
>> So, if kill_procs_now() is invoked only if MF_ACTION_REQUIRED (as it is in the patch), one can argue that
>>
>> the MCE-early-kill request is not honored which deviates from the existing behavior.
>>
>> Perhaps I should remove the
>>
>> + if (flags & MF_ACTION_REQUIRED) {
> I tend to agree MCE-early-kill request should be honored when try to kill process.
> Thanks.
> .

Thanks,

-jane

>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed
  2024-05-09  2:54   ` Miaohe Lin
@ 2024-05-09 16:40     ` Jane Chu
  0 siblings, 0 replies; 26+ messages in thread
From: Jane Chu @ 2024-05-09 16:40 UTC (permalink / raw
  To: Miaohe Lin, nao.horiguchi, akpm, linux-mm, linux-kernel

On 5/8/2024 7:54 PM, Miaohe Lin wrote:

> On 2024/5/2 7:24, Jane Chu wrote:
>> For years when it comes down to kill a process due to hwpoison,
>> a SIGBUS is delivered only if unmap has been successful.
>> Otherwise, a SIGKILL is delivered. And the reason for that is
>> to prevent the involved process from accessing the hwpoisoned
>> page again.
>>
>> Since then a lot has changed, a hwpoisoned page is marked and
>> upon being re-accessed, the process will be killed immediately.
>> So let's take out the '!unmap_success' factor and try to deliver
>> SIGBUS if possible.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>> ---
>>   mm/memory-failure.c | 13 ++++---------
>>   1 file changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 9e62a00b46dd..7fcf182abb96 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -519,19 +519,14 @@ void add_to_kill_ksm(struct task_struct *tsk, struct page *p,
>>    * Also when FAIL is set do a force kill because something went
>>    * wrong earlier.
>>    */
>> -static void kill_procs(struct list_head *to_kill, int forcekill, bool fail,
>> +static void kill_procs(struct list_head *to_kill, int forcekill,
>>   		unsigned long pfn, int flags)
>>   {
>>   	struct to_kill *tk, *next;
>>   
>>   	list_for_each_entry_safe(tk, next, to_kill, nd) {
>>   		if (forcekill) {
>> -			/*
>> -			 * In case something went wrong with munmapping
>> -			 * make sure the process doesn't catch the
>> -			 * signal and then access the memory. Just kill it.
>> -			 */
>> -			if (fail || tk->addr == -EFAULT) {
>> +			if (tk->addr == -EFAULT) {
>>   				pr_err("%#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n",
>>   				       pfn, tk->tsk->comm, tk->tsk->pid);
>>   				do_send_sig_info(SIGKILL, SEND_SIG_PRIV,
>> @@ -1666,7 +1661,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>>   	 */
> There is comment above the forcekill saying:
>
>      When there was a problem unmapping earlier use a more force-full
> uncatchable kill to prevent any accesses to the poisoned memory.
>
> This might need to be changed too.

Yes, will do.

thanks!

-jane

> Thanks.
> .
>
>>   	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL) ||
>>   		    !unmap_success;
>> -	kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
>> +	kill_procs(&tokill, forcekill, pfn, flags);
>>   
>>   	return unmap_success;
>>   }
>> @@ -1730,7 +1725,7 @@ static void unmap_and_kill(struct list_head *to_kill, unsigned long pfn,
>>   		unmap_mapping_range(mapping, start, size, 0);
>>   	}
>>   
>> -	kill_procs(to_kill, flags & MF_MUST_KILL, false, pfn, flags);
>> +	kill_procs(to_kill, flags & MF_MUST_KILL, pfn, flags);
>>   }
>>   
>>   /*
>>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-09 15:34             ` Jane Chu
@ 2024-05-10  2:59               ` Miaohe Lin
  2024-05-10  3:18                 ` Jane Chu
  0 siblings, 1 reply; 26+ messages in thread
From: Miaohe Lin @ 2024-05-10  2:59 UTC (permalink / raw
  To: Jane Chu, nao.horiguchi, akpm, linux-mm, linux-kernel

On 2024/5/9 23:34, Jane Chu wrote:
> 
> On 5/9/2024 1:30 AM, Miaohe Lin wrote:
>> On 2024/5/9 1:45, Jane Chu wrote:
>>> On 5/8/2024 1:08 AM, Miaohe Lin wrote:
>>>
>>>> On 2024/5/7 4:26, Jane Chu wrote:
>>>>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>>>>
>>>>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>>>>> give it a chance to recover.
>>>>>>>
>>>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>>>>> Thanks for your patch. Some comments below.
>>>>>>
>>>>>>> ---
>>>>>>>     mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>>>>     1 file changed, 36 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>>>>> --- a/mm/memory-failure.c
>>>>>>> +++ b/mm/memory-failure.c
>>>>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>>>>         return rc;
>>>>>>>     }
>>>>>>>     +/*
>>>>>>> + * The calling condition is as such: thp split failed, page might have
>>>>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>>>>> + * application has a chance to recover. Also, application processes'
>>>>>>> + * election for MCE early killed will be honored.
>>>>>>> + */
>>>>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>>>>> +            struct page *hpage)
>>>>>>> +{
>>>>>>> +    struct folio *folio = page_folio(hpage);
>>>>>>> +    LIST_HEAD(tokill);
>>>>>>> +    int res = -EHWPOISON;
>>>>>>> +
>>>>>>> +    /* deal with user pages only */
>>>>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>>>>> +        res = -EBUSY;
>>>>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>>>>> +        res = -EBUSY;
>>>>>> Above checks seems unneeded. We already know it's thp?
>>>>> Agreed.
>>>>>
>>>>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>>>>
>>>>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>>>>
>>>>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>>>>
>>>>>>> +
>>>>>>> +    if (res == -EHWPOISON) {
>>>>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (flags & MF_COUNT_INCREASED)
>>>>>>> +        put_page(p);
>>>>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>>>>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
>>>> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>>>>
>>>> static int try_to_split_thp_page(struct page *page)
>>>> {
>>>>      int ret;
>>>>
>>>>      lock_page(page);
>>>>      ret = split_huge_page(page);
>>>>      unlock_page(page);
>>>>
>>>>      if (unlikely(ret))
>>>>          put_page(page);
>>>>      ^^^^^^^^^^^^^^^^^^^^^^^
>>>>      return ret;
>>>> }
>>>>
>>>> Or am I miss something?
>>> I think you caught a bug in my code, thanks!
>>>
>>> How about moving put_page() outside try_to_split_thp_page() ?
>> If you want to send SIGBUS in the event of thp split fail, it might be required to do so.
>> I think kill_procs_now() needs extra thp refcnt to do its work.
> 
> Agreed.  I added an boolean to try_to_split_thp_page(),the boolean indicates whether to put_page().

IMHO, it might be too complicated to add an extra boolean to indicate whether to put_page(). It might be
more straightforward to always put_page outside try_to_split_thp_page?
Thanks.
.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail
  2024-05-10  2:59               ` Miaohe Lin
@ 2024-05-10  3:18                 ` Jane Chu
  0 siblings, 0 replies; 26+ messages in thread
From: Jane Chu @ 2024-05-10  3:18 UTC (permalink / raw
  To: Miaohe Lin, nao.horiguchi, akpm, linux-mm, linux-kernel


On 5/9/2024 7:59 PM, Miaohe Lin wrote:
> On 2024/5/9 23:34, Jane Chu wrote:
>> On 5/9/2024 1:30 AM, Miaohe Lin wrote:
>>> On 2024/5/9 1:45, Jane Chu wrote:
>>>> On 5/8/2024 1:08 AM, Miaohe Lin wrote:
>>>>
>>>>> On 2024/5/7 4:26, Jane Chu wrote:
>>>>>> On 5/5/2024 12:00 AM, Miaohe Lin wrote:
>>>>>>
>>>>>>> On 2024/5/2 7:24, Jane Chu wrote:
>>>>>>>> When handle hwpoison in a GUP longterm pin'ed thp page,
>>>>>>>> try_to_split_thp_page() will fail. And at this point, there is little else
>>>>>>>> the kernel could do except sending a SIGBUS to the user process, thus
>>>>>>>> give it a chance to recover.
>>>>>>>>
>>>>>>>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
>>>>>>> Thanks for your patch. Some comments below.
>>>>>>>
>>>>>>>> ---
>>>>>>>>      mm/memory-failure.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>>>>>      1 file changed, 36 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>>> index 7fcf182abb96..67f4d24a98e7 100644
>>>>>>>> --- a/mm/memory-failure.c
>>>>>>>> +++ b/mm/memory-failure.c
>>>>>>>> @@ -2168,6 +2168,37 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>>>>>>>>          return rc;
>>>>>>>>      }
>>>>>>>>      +/*
>>>>>>>> + * The calling condition is as such: thp split failed, page might have
>>>>>>>> + * been GUP longterm pinned, not much can be done for recovery.
>>>>>>>> + * But a SIGBUS should be delivered with vaddr provided so that the user
>>>>>>>> + * application has a chance to recover. Also, application processes'
>>>>>>>> + * election for MCE early killed will be honored.
>>>>>>>> + */
>>>>>>>> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
>>>>>>>> +            struct page *hpage)
>>>>>>>> +{
>>>>>>>> +    struct folio *folio = page_folio(hpage);
>>>>>>>> +    LIST_HEAD(tokill);
>>>>>>>> +    int res = -EHWPOISON;
>>>>>>>> +
>>>>>>>> +    /* deal with user pages only */
>>>>>>>> +    if (PageReserved(p) || PageSlab(p) || PageTable(p) || PageOffline(p))
>>>>>>>> +        res = -EBUSY;
>>>>>>>> +    if (!(PageLRU(hpage) || PageHuge(p)))
>>>>>>>> +        res = -EBUSY;
>>>>>>> Above checks seems unneeded. We already know it's thp?
>>>>>> Agreed.
>>>>>>
>>>>>> I  lifted these checks from hwpoison_user_mapping() with a hope to make kill_procs_now() more generic,
>>>>>>
>>>>>> such as, potentially replacing kill_accessing_processes() for re-accessing hwpoisoned page.
>>>>>>
>>>>>> But I backed out at last, due to concerns that my tests might not have covered sufficient number of scenarios.
>>>>>>
>>>>>>>> +
>>>>>>>> +    if (res == -EHWPOISON) {
>>>>>>>> +        collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
>>>>>>>> +        kill_procs(&tokill, true, pfn, flags);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (flags & MF_COUNT_INCREASED)
>>>>>>>> +        put_page(p);
>>>>>>> This if block is broken. put_page() has been done when try_to_split_thp_page() fails?
>>>>>> put_page() has not been done if try_to_split_thp_page() fails, and I think it should.
>>>>> In try_to_split_thp_page(), if split_huge_page fails, i.e. ret != 0, put_page() is called. See below:
>>>>>
>>>>> static int try_to_split_thp_page(struct page *page)
>>>>> {
>>>>>       int ret;
>>>>>
>>>>>       lock_page(page);
>>>>>       ret = split_huge_page(page);
>>>>>       unlock_page(page);
>>>>>
>>>>>       if (unlikely(ret))
>>>>>           put_page(page);
>>>>>       ^^^^^^^^^^^^^^^^^^^^^^^
>>>>>       return ret;
>>>>> }
>>>>>
>>>>> Or am I miss something?
>>>> I think you caught a bug in my code, thanks!
>>>>
>>>> How about moving put_page() outside try_to_split_thp_page() ?
>>> If you want to send SIGBUS in the event of thp split fail, it might be required to do so.
>>> I think kill_procs_now() needs extra thp refcnt to do its work.
>> Agreed.  I added an boolean to try_to_split_thp_page(),the boolean indicates whether to put_page().
> IMHO, it might be too complicated to add an extra boolean to indicate whether to put_page(). It might be
> more straightforward to always put_page outside try_to_split_thp_page?

Looks okay to me, let's see.  Will send out v2 in a while.

thanks,

-jane

> Thanks.
> .
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-05-10  3:19 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-01 23:24 [PATCH 0/3] Enhance soft hwpoison handling and injection Jane Chu
2024-05-01 23:24 ` [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
2024-05-07  9:02   ` Oscar Salvador
2024-05-07 17:54     ` Jane Chu
2024-05-08 12:06       ` Oscar Salvador
2024-05-08 16:51         ` Jane Chu
2024-05-08  7:47   ` Miaohe Lin
2024-05-08 16:58     ` Jane Chu
2024-05-09  2:54   ` Miaohe Lin
2024-05-09 16:40     ` Jane Chu
2024-05-01 23:24 ` [PATCH 2/3] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
2024-05-05  7:02   ` Miaohe Lin
2024-05-06 19:54     ` Jane Chu
2024-05-08  7:58       ` Miaohe Lin
2024-05-01 23:24 ` [PATCH 3/3] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
2024-05-05  7:00   ` Miaohe Lin
2024-05-06 20:26     ` Jane Chu
2024-05-08  8:08       ` Miaohe Lin
2024-05-08 17:45         ` Jane Chu
2024-05-09  8:30           ` Miaohe Lin
2024-05-09 15:34             ` Jane Chu
2024-05-10  2:59               ` Miaohe Lin
2024-05-10  3:18                 ` Jane Chu
2024-05-08  9:03   ` Miaohe Lin
2024-05-08 16:56     ` Jane Chu
2024-05-09  8:52       ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).