All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Ding Hui <dinghui@sangfor.com.cn>
To: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Oscar Salvador <osalvador@suse.de>,
	Michal Hocko <mhocko@suse.com>, Tony Luck <tony.luck@intel.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v1 6/6] mm/hwpoison: fix unpoison_memory()
Date: Sat, 19 Jun 2021 20:22:32 +0800	[thread overview]
Message-ID: <a2162c28-78ce-1ce7-327c-e8c4dce164cd@sangfor.com.cn> (raw)
In-Reply-To: <20210618083625.GA2215283@hori.linux.bs1.fc.nec.co.jp>

On 2021/6/18 4:36 下午, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, Jun 17, 2021 at 06:00:21PM +0800, Ding Hui wrote:
>> On 2021/6/14 10:12, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>
>>> After recent soft-offline rework, error pages can be taken off from
>>> buddy allocator, but the existing unpoison_memory() does not properly
>>> undo the operation.  Moreover, due to the recent change on
>>> __get_hwpoison_page(), get_page_unless_zero() is hardly called for
>>> hwpoisoned pages.  So __get_hwpoison_page() mostly returns zero (meaning
>>> to fail to grab page refcount) and unpoison just clears PG_hwpoison
>>> without releasing a refcount.  That does not lead to a critical issue
>>> like kernel panic, but unpoisoned pages never get back to buddy (leaked
>>> permanently), which is not good.
>>
>> As I mention in [1], I'm not sure about the exactly meaning of "broken" in
>> unpoison_memory().
>>
>> Maybe the misunderstanding is:
>>
>> I think __get_hwpoison_page() mostly returns one for hwpoisoned page.
>> In 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages"),
>> page_handle_poison() is introduced, it will add refcount for all
>> soft-offlineed hwpoison page.
>> In memory_failure() for hard-offline,page_ref_inc() called on free page
>> too, and for used page, we do not call put_page() after get_hwpoison_page()
>> != 0.
>> So all hwpoisoned page refcount must be great than zero when
>> unpoison_memory() if regardless of racy.
> 
> Hi, Ding,
> 
> Thanks for the comment.  I feel that I failed to define the exact issue in
> unpoison.  Maybe I saw and misinterpreted some random error as unpoison's
> issue during developing other hwpoison patches, so please don't take serious
> my previous wrong word "broken", sorry about that.
> 
> Anyway I reconsider how to handle this 6/6, maybe it will be a clear
> description of the problem, and will be simplified.
> 
>>
>> Recently I tested loop soft-offline random pages and unpoison them for days,
>> it works fine to me. (with bac9c6fa1f92 patched)
> 
> Thank you for testing,
> 

Hi Naoya,

I'm afraid of my description about testing is ambiguous for others, let 
me clarify that I ran stress soft-offline test case from mce-test 
project (https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git) for 
days to verify my modify about NR_FREE_PAGES (bac9c6fa1f92), without 
your current patchset, the case is loop soft-offline random pages and 
unpoison them, and it works basic fine to me.

-- 
Thanks,
-dinghui

      reply	other threads:[~2021-06-19 12:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-14  2:12 [PATCH v1 0/6] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
2021-06-14  2:12 ` [PATCH v1 1/6] mm/hwpoison: mf_mutex for soft offline and unpoison Naoya Horiguchi
2021-06-15 11:41   ` Ding Hui
2021-06-15 11:55     ` HORIGUCHI NAOYA(堀口 直也)
2021-06-15 12:42   ` Miaohe Lin
2021-06-16  0:41     ` HORIGUCHI NAOYA(堀口 直也)
2021-06-16  3:14       ` Miaohe Lin
2021-06-14  2:12 ` [PATCH v1 2/6] mm/hwpoison: remove race consideration Naoya Horiguchi
2021-06-15 12:57   ` Ding Hui
2021-06-16  0:11     ` HORIGUCHI NAOYA(堀口 直也)
2021-06-16  0:40       ` Ding Hui
2021-06-14  2:12 ` [PATCH v1 3/6] mm/hwpoison: introduce MF_MSG_PAGETABLE Naoya Horiguchi
2021-06-14  3:06   ` Matthew Wilcox
2021-06-14  3:55     ` HORIGUCHI NAOYA(堀口 直也)
2021-06-14  2:12 ` [PATCH v1 4/6] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE Naoya Horiguchi
2021-06-14  2:12 ` [PATCH v1 5/6] mm/hwpoison: make some kernel pages handlable Naoya Horiguchi
2021-07-28 10:59   ` Ding Hui
2021-07-29  6:54     ` HORIGUCHI NAOYA(堀口 直也)
2021-06-14  2:12 ` [PATCH v1 6/6] mm/hwpoison: fix unpoison_memory() Naoya Horiguchi
2021-06-17 10:00   ` Ding Hui
2021-06-18  8:36     ` HORIGUCHI NAOYA(堀口 直也)
2021-06-19 12:22       ` Ding Hui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2162c28-78ce-1ce7-327c-e8c4dce164cd@sangfor.com.cn \
    --to=dinghui@sangfor.com.cn \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.