All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Zi Yan <ziy@nvidia.com>, Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Huacai Chen <chenhuacai@kernel.org>
Subject: Re: [PATCH v1] mm: Fix race between __split_huge_pmd_locked() and GUP-fast
Date: Fri, 26 Apr 2024 10:20:03 +0530	[thread overview]
Message-ID: <af20fd17-1e31-4e28-8773-1298aa799aec@arm.com> (raw)
In-Reply-To: <922F6794-1C32-4862-8A67-90D4F7DBC474@nvidia.com>



On 4/26/24 00:28, Zi Yan wrote:
> +Anshuman, who changed pmd_present() semantics. See:
> https://lore.kernel.org/all/1599627183-14453-2-git-send-email-anshuman.khandual@arm.com/ and commit b65399f6111b ("arm64/mm: Change
> THP helpers to comply with generic MM semantics")
> 
> On 25 Apr 2024, at 13:07, Ryan Roberts wrote:
> 
>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>> (non-present) migration entry. It calls pmdp_invalidate()
>> unconditionally on the pmdp and only determines if it is present or not
>> based on the returned old pmd. This is a problem for the migration entry
>> case because pmd_mkinvalid(), called by pmdp_invalidate() must only be
>> called for a present pmd.
>>
>> On arm64 at least, pmd_mkinvalid() will mark the pmd such that any
>> future call to pmd_present() will return true. And therefore any
> 
> But pmd_mkinvalid() on x86 does not behave so. Maybe we should fix
> pmd_mkinvalid() on arm64 by not setting PMD_PRESENT_INVALID when the
> entry is invalid already. And add a test in mm/debug_vm_pgtable.c.
> 
> I notice that x86, risc-v, mips behave the same. loongarch also
> has _PAGE_PRESENT_INVALID bit set during pmd_mkinvalid(), but its
> pmd_present() makes sure _PAGE_HUEG is set before checks _PAGE_PRESENT_INVALID.
> So it is not a problem for loongarch. Add Huacai to confirm this.
> 
> Maybe pmd_present() on arm64 can do that too?

pmd_present() should return true even for a splitting PMD which is not
currently mapped. IIRC in all other architectures, there is a distinct
identification bit for huge page which stays back, even when the entry
becomes unmapped. That bit helps pmd_present() return true, during PMD
splitting process.

But on arm64 platform

#define PTE_VALID                (_AT(pteval_t, 1) << 0)
#define PMD_SECT_VALID           (_AT(pmdval_t, 1) << 0)
#define PMD_TABLE_BIT            (_AT(pmdval_t, 1) << 1)

When the PMD entry becomes unmapped, PTE_VALID gets cleared, so does the
PMD_SECT_VALID, thus erasing its identity as a huge mapping. A software
bit PMD_PRESENT_INVALID was added which helps in preserving that cleared
huge page mapping identity once it becomes unmapped.

> 
>> lockless pgtable walker could see the migration entry pmd in this state
>> and start interpretting the fields as if it were present, leading to
>> BadThings (TM). GUP-fast appears to be one such lockless pgtable walker.
>> I suspect the same is possible on other architectures.
>>
>> Fix this by only calling pmdp_invalidate() for a present pmd. And for
>> good measure let's add a warning to the generic implementation of
>> pmdp_invalidate(). I've manually reviewed all other
>> pmdp_invalidate[_ad]() call sites and believe all others to be
>> conformant.
>>
>> This is a theoretical bug found during code review. I don't have any
>> test case to trigger it in practice.
>>
>> Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>
>> Applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>
>> Thanks,
>> Ryan
>>
>>
>>  mm/huge_memory.c     | 5 +++--
>>  mm/pgtable-generic.c | 2 ++
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 89f58c7603b2..80939ad00718 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2513,12 +2513,12 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>  	 * for this pmd), then we flush the SMP TLB and finally we write the
>>  	 * non-huge version of the pmd entry with pmd_populate.
>>  	 */
>> -	old_pmd = pmdp_invalidate(vma, haddr, pmd);
>>
>> -	pmd_migration = is_pmd_migration_entry(old_pmd);
>> +	pmd_migration = is_pmd_migration_entry(*pmd);
>>  	if (unlikely(pmd_migration)) {
>>  		swp_entry_t entry;
>>
>> +		old_pmd = *pmd;
>>  		entry = pmd_to_swp_entry(old_pmd);
>>  		page = pfn_swap_entry_to_page(entry);
>>  		write = is_writable_migration_entry(entry);
>> @@ -2529,6 +2529,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
>>  		soft_dirty = pmd_swp_soft_dirty(old_pmd);
>>  		uffd_wp = pmd_swp_uffd_wp(old_pmd);
>>  	} else {
>> +		old_pmd = pmdp_invalidate(vma, haddr, pmd);
>>  		page = pmd_page(old_pmd);
>>  		folio = page_folio(page);
>>  		if (pmd_dirty(old_pmd)) {
>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>> index 4fcd959dcc4d..74e34ea90656 100644
>> --- a/mm/pgtable-generic.c
>> +++ b/mm/pgtable-generic.c
>> @@ -198,6 +198,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
>>  pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>>  		     pmd_t *pmdp)
>>  {
>> +	VM_WARN_ON(!pmd_present(*pmdp));
>>  	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
>>  	flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>>  	return old;
>> @@ -208,6 +209,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>>  pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
>>  			 pmd_t *pmdp)
>>  {
>> +	VM_WARN_ON(!pmd_present(*pmdp));  	return pmdp_invalidate(vma, address, pmdp);
>>  }
>>  #endif
>> --
>> 2.25.1
> 
> 
> --
> Best Regards,
> Yan, Zi

  reply	other threads:[~2024-04-26  4:50 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25 17:07 [PATCH v1] mm: Fix race between __split_huge_pmd_locked() and GUP-fast Ryan Roberts
2024-04-25 18:58 ` Zi Yan
2024-04-26  4:50   ` Anshuman Khandual [this message]
2024-04-26 14:33     ` Zi Yan
2024-04-29  3:36       ` Anshuman Khandual
2024-04-26  7:48   ` Ryan Roberts
2024-04-26  4:19 ` Anshuman Khandual
2024-04-26  7:43   ` Ryan Roberts
2024-04-26 14:49     ` Zi Yan
2024-04-26 14:53       ` Zi Yan
2024-04-27  4:25         ` John Hubbard
2024-04-27 15:07           ` Zi Yan
2024-04-29  5:31             ` Anshuman Khandual
2024-04-29  5:25       ` Anshuman Khandual
2024-04-29  5:07     ` Anshuman Khandual
2024-04-27  4:41 ` John Hubbard
2024-04-27 15:14   ` Zi Yan
2024-04-27 19:11     ` John Hubbard
2024-04-27 20:45       ` Zi Yan
2024-04-27 20:48         ` Zi Yan
2024-04-29  6:17           ` Anshuman Khandual
2024-04-29 14:41             ` Zi Yan
2024-04-29  9:29       ` Ryan Roberts
2024-04-29 14:45         ` Zi Yan
2024-04-29 15:29           ` Zi Yan
2024-04-29 15:35             ` Ryan Roberts
2024-04-29 15:34           ` Ryan Roberts
2024-04-29 16:02             ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af20fd17-1e31-4e28-8773-1298aa799aec@arm.com \
    --to=anshuman.khandual@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=chenhuacai@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.