From: Vlastimil Babka <vbabka@suse.cz> To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Andrew Morton <akpm@linux-foundation.org>, Andrea Arcangeli <aarcange@redhat.com>, Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com>, Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>, Christoph Lameter <cl@gentwo.org>, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, Steve Capper <steve.capper@linaro.org>, "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.cz>, Jerome Marchand <jmarchan@redhat.com>, Sasha Levin <sasha.levin@oracle.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCHv6 32/36] thp: reintroduce split_huge_page() Date: Wed, 10 Jun 2015 17:44:30 +0200 [thread overview] Message-ID: <55785B5E.3000306@suse.cz> (raw) In-Reply-To: <1433351167-125878-33-git-send-email-kirill.shutemov@linux.intel.com> On 06/03/2015 07:06 PM, Kirill A. Shutemov wrote: > This patch adds implementation of split_huge_page() for new > refcountings. > > Unlike previous implementation, new split_huge_page() can fail if > somebody holds GUP pin on the page. It also means that pin on page > would prevent it from bening split under you. It makes situation in > many places much cleaner. > > The basic scheme of split_huge_page(): > > - Check that sum of mapcounts of all subpage is equal to page_count() > plus one (caller pin). Foll off with -EBUSY. This way we can avoid > useless PMD-splits. > > - Freeze the page counters by splitting all PMD and setup migration > PTEs. > > - Re-check sum of mapcounts against page_count(). Page's counts are > stable now. -EBUSY if page is pinned. > > - Split compound page. > > - Unfreeze the page by removing migration entries. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Tested-by: Sasha Levin <sasha.levin@oracle.com> [...] > + > +static int __split_huge_page_tail(struct page *head, int tail, > + struct lruvec *lruvec, struct list_head *list) > +{ > + int mapcount; > + struct page *page_tail = head + tail; > + > + mapcount = page_mapcount(page_tail); Isn't page_mapcount() unnecessarily heavyweight here? When you are splitting a page, it already should have zero compound_mapcount() and shouldn't be PageDoubleMap(), no? So you should care about page->_mapcount only? Sure, splitting THP is not a hotpath, but when done 512 times per split, it could make some difference in the split's latency. > + VM_BUG_ON_PAGE(atomic_read(&page_tail->_count) != 0, page_tail); > + > + /* > + * tail_page->_count is zero and not changing from under us. But > + * get_page_unless_zero() may be running from under us on the > + * tail_page. If we used atomic_set() below instead of atomic_add(), we > + * would then run atomic_set() concurrently with > + * get_page_unless_zero(), and atomic_set() is implemented in C not > + * using locked ops. spin_unlock on x86 sometime uses locked ops > + * because of PPro errata 66, 92, so unless somebody can guarantee > + * atomic_set() here would be safe on all archs (and not only on x86), > + * it's safer to use atomic_add(). I would be surprised if this was the first place to use atomic_set() with potential concurrent atomic_add(). Shouldn't atomic_*() API guarantee that this works? > + */ > + atomic_add(page_mapcount(page_tail) + 1, &page_tail->_count); You already have the value in mapcount variable, so why read it again.
WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz> To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Andrew Morton <akpm@linux-foundation.org>, Andrea Arcangeli <aarcange@redhat.com>, Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com>, Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>, Christoph Lameter <cl@gentwo.org>, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, Steve Capper <steve.capper@linaro.org>, "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@suse.cz>, Jerome Marchand <jmarchan@redhat.com>, Sasha Levin <sasha.levin@oracle.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCHv6 32/36] thp: reintroduce split_huge_page() Date: Wed, 10 Jun 2015 17:44:30 +0200 [thread overview] Message-ID: <55785B5E.3000306@suse.cz> (raw) In-Reply-To: <1433351167-125878-33-git-send-email-kirill.shutemov@linux.intel.com> On 06/03/2015 07:06 PM, Kirill A. Shutemov wrote: > This patch adds implementation of split_huge_page() for new > refcountings. > > Unlike previous implementation, new split_huge_page() can fail if > somebody holds GUP pin on the page. It also means that pin on page > would prevent it from bening split under you. It makes situation in > many places much cleaner. > > The basic scheme of split_huge_page(): > > - Check that sum of mapcounts of all subpage is equal to page_count() > plus one (caller pin). Foll off with -EBUSY. This way we can avoid > useless PMD-splits. > > - Freeze the page counters by splitting all PMD and setup migration > PTEs. > > - Re-check sum of mapcounts against page_count(). Page's counts are > stable now. -EBUSY if page is pinned. > > - Split compound page. > > - Unfreeze the page by removing migration entries. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Tested-by: Sasha Levin <sasha.levin@oracle.com> [...] > + > +static int __split_huge_page_tail(struct page *head, int tail, > + struct lruvec *lruvec, struct list_head *list) > +{ > + int mapcount; > + struct page *page_tail = head + tail; > + > + mapcount = page_mapcount(page_tail); Isn't page_mapcount() unnecessarily heavyweight here? When you are splitting a page, it already should have zero compound_mapcount() and shouldn't be PageDoubleMap(), no? So you should care about page->_mapcount only? Sure, splitting THP is not a hotpath, but when done 512 times per split, it could make some difference in the split's latency. > + VM_BUG_ON_PAGE(atomic_read(&page_tail->_count) != 0, page_tail); > + > + /* > + * tail_page->_count is zero and not changing from under us. But > + * get_page_unless_zero() may be running from under us on the > + * tail_page. If we used atomic_set() below instead of atomic_add(), we > + * would then run atomic_set() concurrently with > + * get_page_unless_zero(), and atomic_set() is implemented in C not > + * using locked ops. spin_unlock on x86 sometime uses locked ops > + * because of PPro errata 66, 92, so unless somebody can guarantee > + * atomic_set() here would be safe on all archs (and not only on x86), > + * it's safer to use atomic_add(). I would be surprised if this was the first place to use atomic_set() with potential concurrent atomic_add(). Shouldn't atomic_*() API guarantee that this works? > + */ > + atomic_add(page_mapcount(page_tail) + 1, &page_tail->_count); You already have the value in mapcount variable, so why read it again. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-06-10 15:44 UTC|newest] Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-06-03 17:05 [PATCHv6 00/36] THP refcounting redesign Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 01/36] mm, proc: adjust PSS calculation Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-09 12:29 ` Vlastimil Babka 2015-06-09 12:29 ` Vlastimil Babka 2015-06-22 10:02 ` Kirill A. Shutemov 2015-06-22 10:02 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 02/36] rmap: add argument to charge compound page Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 03/36] memcg: adjust to support new THP refcounting Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 04/36] mm, thp: adjust conditions when we can reuse the page on WP fault Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 05/36] mm: adjust FOLL_SPLIT for new refcounting Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 06/36] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 07/36] thp, mlock: do not allow huge pages in mlocked area Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 08/36] khugepaged: ignore pmd tables with THP mapped with ptes Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 09/36] thp: rename split_huge_page_pmd() to split_huge_pmd() Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 10/36] mm, vmstats: new THP splitting event Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 11/36] mm: temporally mark THP broken Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 12/36] thp: drop all split_huge_page()-related code Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 13/36] mm: drop tail page refcounting Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-09 13:59 ` Vlastimil Babka 2015-06-09 13:59 ` Vlastimil Babka 2015-06-03 17:05 ` [PATCHv6 14/36] futex, thp: remove special case for THP in get_futex_key Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 15/36] ksm: prepare to new THP semantics Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 16/36] mm, thp: remove compound_lock Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 17/36] arm64, thp: remove infrastructure for handling splitting PMDs Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 18/36] arm, " Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 19/36] mips, " Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 20/36] powerpc, " Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 21/36] s390, " Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 22/36] sparc, " Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 23/36] tile, " Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 24/36] x86, " Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 25/36] mm, " Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 26/36] mm: rework mapcount accounting to enable 4k mapping of THPs Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-10 13:47 ` Vlastimil Babka 2015-06-10 13:47 ` Vlastimil Babka 2015-06-22 10:22 ` Kirill A. Shutemov 2015-06-22 10:22 ` Kirill A. Shutemov 2015-06-03 17:05 ` [PATCHv6 27/36] mm: differentiate page_mapped() from page_mapcount() for compound pages Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-09 10:58 ` Kirill A. Shutemov 2015-06-09 10:58 ` Kirill A. Shutemov 2015-06-10 14:34 ` Vlastimil Babka 2015-06-10 14:34 ` Vlastimil Babka 2015-06-03 17:05 ` [PATCHv6 28/36] mm, numa: skip PTE-mapped THP on numa fault Kirill A. Shutemov 2015-06-03 17:05 ` Kirill A. Shutemov 2015-06-03 17:06 ` [PATCHv6 29/36] thp: implement split_huge_pmd() Kirill A. Shutemov 2015-06-03 17:06 ` Kirill A. Shutemov 2015-06-11 9:49 ` Vlastimil Babka 2015-06-11 9:49 ` Vlastimil Babka 2015-06-22 11:14 ` Kirill A. Shutemov 2015-06-22 11:14 ` Kirill A. Shutemov 2015-06-22 16:01 ` Vlastimil Babka 2015-06-22 16:01 ` Vlastimil Babka 2015-06-03 17:06 ` [PATCHv6 30/36] thp: add option to setup migration entiries during PMD split Kirill A. Shutemov 2015-06-03 17:06 ` Kirill A. Shutemov 2015-06-03 17:06 ` [PATCHv6 31/36] thp, mm: split_huge_page(): caller need to lock page Kirill A. Shutemov 2015-06-03 17:06 ` Kirill A. Shutemov 2015-06-03 17:06 ` [PATCHv6 32/36] thp: reintroduce split_huge_page() Kirill A. Shutemov 2015-06-03 17:06 ` Kirill A. Shutemov 2015-06-10 15:44 ` Vlastimil Babka [this message] 2015-06-10 15:44 ` Vlastimil Babka 2015-06-22 11:28 ` Kirill A. Shutemov 2015-06-22 11:28 ` Kirill A. Shutemov 2015-06-03 17:06 ` [PATCHv6 33/36] migrate_pages: try to split pages on qeueuing Kirill A. Shutemov 2015-06-03 17:06 ` Kirill A. Shutemov 2015-06-11 9:27 ` Vlastimil Babka 2015-06-11 9:27 ` Vlastimil Babka 2015-06-22 11:35 ` Kirill A. Shutemov 2015-06-22 11:35 ` Kirill A. Shutemov 2015-06-03 17:06 ` [PATCHv6 34/36] thp: introduce deferred_split_huge_page() Kirill A. Shutemov 2015-06-03 17:06 ` Kirill A. Shutemov 2015-06-03 17:06 ` [PATCHv6 35/36] mm: re-enable THP Kirill A. Shutemov 2015-06-03 17:06 ` Kirill A. Shutemov 2015-06-03 17:06 ` [PATCHv6 36/36] thp: update documentation Kirill A. Shutemov 2015-06-03 17:06 ` Kirill A. Shutemov 2015-06-11 12:30 ` Vlastimil Babka 2015-06-11 12:30 ` Vlastimil Babka 2015-06-22 13:18 ` Kirill A. Shutemov 2015-06-22 13:18 ` Kirill A. Shutemov 2015-06-22 16:07 ` Vlastimil Babka 2015-06-22 16:07 ` Vlastimil Babka 2015-06-16 13:17 ` [PATCHv6 00/36] THP refcounting redesign Jerome Marchand 2015-06-22 13:21 ` Kirill A. Shutemov 2015-06-22 13:21 ` Kirill A. Shutemov 2015-06-22 13:32 ` Jerome Marchand 2015-06-22 13:39 ` Kirill A. Shutemov 2015-06-22 13:39 ` Kirill A. Shutemov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=55785B5E.3000306@suse.cz \ --to=vbabka@suse.cz \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=aneesh.kumar@linux.vnet.ibm.com \ --cc=cl@gentwo.org \ --cc=dave.hansen@intel.com \ --cc=hannes@cmpxchg.org \ --cc=hughd@google.com \ --cc=jmarchan@redhat.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=mhocko@suse.cz \ --cc=n-horiguchi@ah.jp.nec.com \ --cc=riel@redhat.com \ --cc=sasha.levin@oracle.com \ --cc=steve.capper@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.