NVDIMM Device and Persistent Memory development
 help / color / mirror / Atom feed
From: Jane Chu <jane.chu@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	nvdimm@lists.linux.dev, Dan Williams <dan.j.williams@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH] mm: Convert DAX lock/unlock page to lock/unlock folio
Date: Thu, 24 Aug 2023 14:30:21 -0700	[thread overview]
Message-ID: <b2b0fce8-b7f8-420e-0945-ab9581b23d9a@oracle.com> (raw)
In-Reply-To: <ZOeq4HJwCULHPtaU@casper.infradead.org>

On 8/24/2023 12:09 PM, Matthew Wilcox wrote:
> On Thu, Aug 24, 2023 at 11:24:20AM -0700, Jane Chu wrote:
>>
>> On 8/22/2023 4:13 PM, Matthew Wilcox (Oracle) wrote:
>> [..]
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index a6c3af985554..b81d6eb4e6ff 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1717,16 +1717,11 @@ static int mf_generic_kill_procs(unsigned long long pfn, int flags,
>>>    		struct dev_pagemap *pgmap)
>>>    {
>>>    	struct page *page = pfn_to_page(pfn);
>>
>> Looks like the above line, that is, the 'page' pointer is no longer needed.
> 
> So ...
> 
> It seems to me that currently handling of hwpoison for DAX memory is
> handled on a per-allocation basis but it should probably be handled
> on a per-page basis eventually?

My recollection is that since the inception of 
memory_failure_dev_pagemap(), dax poison handling is kind of on per-page 
basis in that, the .si_addr points to the subpage vaddr, though the 
.si_lsb indicates the user mapping size.

> 
> If so, we'd want to do something like this ...
> 
> +++ b/mm/memory-failure.c
> @@ -1755,7 +1755,9 @@ static int mf_generic_kill_procs(unsigned long long pfn, int flags,
>           * Use this flag as an indication that the dax page has been
>           * remapped UC to prevent speculative consumption of poison.
>           */
> -       SetPageHWPoison(&folio->page);
> +       SetPageHWPoison(page);
> +       if (folio_test_large(folio))
> +               folio_set_has_hwpoisoned(folio);
> 
>          /*
>           * Unlike System-RAM there is no possibility to swap in a
> @@ -1766,7 +1768,8 @@ static int mf_generic_kill_procs(unsigned long long pfn, int flags,
>          flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
>          collect_procs(&folio->page, &to_kill, true);
> 
> -       unmap_and_kill(&to_kill, pfn, folio->mapping, folio->index, flags);
> +       unmap_and_kill(&to_kill, pfn, folio->mapping,
> +                       folio->index + folio_page_idx(folio, page), flags);
>   unlock:
>          dax_unlock_folio(folio, cookie);
>          return rc;
> 

This change make sense to me.
mf_generic_kill_procs() is the generic version if DAX-FS does not 
register/provide dax_dev->holder_ops->notify_failure. Currently only 
DAX-XFS does the registration and utilizes the specific version: 
mf_dax_kill_procs().

> But this is a change in current behaviour and I didn't want to think
> through the implications of all of this.  Would you like to take on this
> project?  ;-)
> 
Sure, be happy to.

> 
> My vague plan for hwpoison in the memdesc world is that poison is always
> handled on a per-page basis (by means of setting page->memdesc to a
> hwpoison data structure).  If the allocation contains multiple pages,
> then we set a flag somewhere like the current has_hwpoisoned flag.

Could you elaborate on "by means of setting page->memdesc to a hwpoison 
data structure" please?

As for the PG_has_hwpoisoned flag, I see one reference for now -
$ git grep folio_test_has_hwpoisoned
mm/shmem.c:                 folio_test_has_hwpoisoned(folio)))
$ git grep folio_clear_has_hwpoisoned
<none>

A dax thp page is a thp page that is potentially recoverable from 
hwpoison(s). If a dax thp page has multiple hwpoisons, only when the 
last poisoned subpage is recovered, could we call 
folio_clear_has_hwpoisoned() - this implies keeping track of the number 
of poisoned subpages per thp somehow, any thoughts on the best thing to 
do?  hmm, maybe add a field in pgmap? or add a query to the driver to 
return whether there is any hwpoison in a given range?

thanks!
-jane





      reply	other threads:[~2023-08-24 21:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-22 23:13 [PATCH] mm: Convert DAX lock/unlock page to lock/unlock folio Matthew Wilcox (Oracle)
2023-08-23  5:51 ` Naoya Horiguchi
2023-08-24 18:24 ` Jane Chu
2023-08-24 19:09   ` Matthew Wilcox
2023-08-24 21:30     ` Jane Chu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b2b0fce8-b7f8-420e-0945-ab9581b23d9a@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).