LKML Archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>,
	David Rientjes <rientjes@google.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	HORIGUCHI NAOYA <naoya.horiguchi@nec.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Waiman Long <longman@redhat.com>, Peter Xu <peterx@redhat.com>,
	Mina Almasry <almasrymina@google.com>,
	Hillf Danton <hdanton@sina.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Will Deacon <will@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v4 4/8] hugetlb: create remove_hugetlb_page() to separate functionality
Date: Tue, 6 Apr 2021 09:49:13 -0700	[thread overview]
Message-ID: <b684d7bc-4c59-0beb-3af7-a75e76e77a87@oracle.com> (raw)
In-Reply-To: <YGwwO0galuKQsD0J@dhcp22.suse.cz>

On 4/6/21 2:56 AM, Michal Hocko wrote:
> On Mon 05-04-21 16:00:39, Mike Kravetz wrote:
>> The new remove_hugetlb_page() routine is designed to remove a hugetlb
>> page from hugetlbfs processing.  It will remove the page from the active
>> or free list, update global counters and set the compound page
>> destructor to NULL so that PageHuge() will return false for the 'page'.
>> After this call, the 'page' can be treated as a normal compound page or
>> a collection of base size pages.
>>
>> update_and_free_page no longer decrements h->nr_huge_pages{_node} as
>> this is performed in remove_hugetlb_page.  The only functionality
>> performed by update_and_free_page is to free the base pages to the lower
>> level allocators.
>>
>> update_and_free_page is typically called after remove_hugetlb_page.
>>
>> remove_hugetlb_page is to be called with the hugetlb_lock held.
>>
>> Creating this routine and separating functionality is in preparation for
>> restructuring code to reduce lock hold times.  This commit should not
>> introduce any changes to functionality.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Btw. I would prefer to reverse the ordering of this and Oscar's
> patchset. This one is a bug fix which might be interesting for stable
> backports while Oscar's work can be looked as a general functionality
> improvement.

Ok, that makes sense.

Andrew, can we make this happen?  It would require removing Oscar's
series until it can be modified to work on top of this.
There is only one small issue with this series as it originally went
into mmotm.  There is a missing conversion of spin_lock to spin_lock_irq
in patch 7.  In addition, there are some suggested changes from Oscar to
this patch.  I do not think they are necessary, but I could make those
as well.  Let me know what I can do to help make this happen.

>> @@ -2298,6 +2312,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>>  		/*
>>  		 * Freed from under us. Drop new_page too.
>>  		 */
>> +		remove_hugetlb_page(h, new_page, false);
>>  		update_and_free_page(h, new_page);
>>  		goto unlock;
>>  	} else if (page_count(old_page)) {
>> @@ -2305,6 +2320,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page,
>>  		 * Someone has grabbed the page, try to isolate it here.
>>  		 * Fail with -EBUSY if not possible.
>>  		 */
>> +		remove_hugetlb_page(h, new_page, false);
>>  		update_and_free_page(h, new_page);
>>  		spin_unlock(&hugetlb_lock);
>>  		if (!isolate_huge_page(old_page, list))
> 
> the page is not enqued anywhere here so remove_hugetlb_page would blow
> when linked list debugging is enabled.

I also thought this would be an issue.  However, INIT_LIST_HEAD would
have been called for the page so,

static inline void INIT_LIST_HEAD(struct list_head *list)
{
        WRITE_ONCE(list->next, list);
        list->prev = list;
}

The debug checks of concern in __list_del_entry_valid are:

            CHECK_DATA_CORRUPTION(prev->next != entry,
                        "list_del corruption. prev->next should be %px, but was
%px\n",
                        entry, prev->next) ||
            CHECK_DATA_CORRUPTION(next->prev != entry,
                        "list_del corruption. next->prev should be %px, but was
%px\n",
                        entry, next->prev))

Since, all pointers point back to the list(head) the check passes.  My
track record with the list routines is not so good, so I actually
forced list_del after INIT_LIST_HEAD with list debugging enabled and did
not enounter any issues.

Going forward, I agree it would be better to perhaps add a list_empty
check so that things do not blow up if the debugging code is changed.

At one time I also thought of splitting the functionality in
alloc_fresh_huge_page and prep_new_huge_page so that it would only
allocate/prep the page but not increment nr_huge_pages.  A new routine
would be used to increment the counter when it was actually put into use.
I thought this could be used when doing bulk adjustments in set_max_huge_pages
but the benefit would be minimal.  This seems like something that would
be useful in Oscar's alloc_and_dissolve_huge_page routine.
-- 
Mike Kravetz

  parent reply	other threads:[~2021-04-06 16:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 23:00 [PATCH v4 0/8] make hugetlb put_page safe for all calling contexts Mike Kravetz
2021-04-05 23:00 ` [PATCH v4 1/8] mm/cma: change cma mutex to irq safe spinlock Mike Kravetz
2021-04-06  9:17   ` Peter Zijlstra
2021-04-06  9:35   ` Michal Hocko
2021-04-07  9:23   ` David Hildenbrand
2021-04-07 18:24   ` Roman Gushchin
2021-04-05 23:00 ` [PATCH v4 2/8] hugetlb: no need to drop hugetlb_lock to call cma_release Mike Kravetz
2021-04-06  7:18   ` Oscar Salvador
2021-04-07  9:26   ` David Hildenbrand
2021-04-05 23:00 ` [PATCH v4 3/8] hugetlb: add per-hstate mutex to synchronize user adjustments Mike Kravetz
2021-04-07  9:28   ` David Hildenbrand
2021-04-05 23:00 ` [PATCH v4 4/8] hugetlb: create remove_hugetlb_page() to separate functionality Mike Kravetz
2021-04-06  9:56   ` Michal Hocko
2021-04-06 12:50     ` Oscar Salvador
2021-04-06 16:49     ` Mike Kravetz [this message]
2021-04-06 17:57       ` Oscar Salvador
2021-04-07  8:21       ` Michal Hocko
2021-04-06 13:41   ` Oscar Salvador
2021-04-06 20:52     ` Mike Kravetz
2021-04-06 13:44   ` Oscar Salvador
2021-04-05 23:00 ` [PATCH v4 5/8] hugetlb: call update_and_free_page without hugetlb_lock Mike Kravetz
2021-04-06  9:57   ` Michal Hocko
2021-04-07  8:27   ` Oscar Salvador
2021-04-07  9:28     ` Michal Hocko
2021-04-07  9:37       ` Oscar Salvador
2021-04-05 23:00 ` [PATCH v4 6/8] hugetlb: change free_pool_huge_page to remove_pool_huge_page Mike Kravetz
2021-04-07  8:44   ` Oscar Salvador
2021-04-05 23:00 ` [PATCH v4 7/8] hugetlb: make free_huge_page irq safe Mike Kravetz
2021-04-07  9:12   ` Oscar Salvador
2021-04-07  9:33     ` Michal Hocko
2021-04-07  9:38       ` Oscar Salvador
2021-04-05 23:00 ` [PATCH v4 8/8] hugetlb: add lockdep_assert_held() calls for hugetlb_lock Mike Kravetz
2021-04-07  9:13   ` Oscar Salvador
2021-04-08  0:56 ` [PATCH v4 0/8] make hugetlb put_page safe for all calling contexts Mike Kravetz
2021-04-08  7:11   ` Oscar Salvador
2021-04-09  5:05     ` Andrew Morton
2021-04-09 20:43       ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b684d7bc-4c59-0beb-3af7-a75e76e77a87@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=guro@fb.com \
    --cc=hdanton@sina.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).