From: Ryan Roberts <ryan.roberts@arm.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
David Hildenbrand <david@redhat.com>,
linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds
Date: Wed, 1 May 2024 13:58:24 +0100 [thread overview]
Message-ID: <2a1b4275-39c5-4c9e-830f-cde16e81c12c@arm.com> (raw)
In-Reply-To: <4DDEF271-9DDE-4D24-9F0C-13046CE78C6C@nvidia.com>
On 01/05/2024 13:07, Zi Yan wrote:
> On 1 May 2024, at 7:38, Ryan Roberts wrote:
>
>> Pulling in David, who may be able to advise...
>>
>>
>> On 01/05/2024 12:35, Ryan Roberts wrote:
>>> Zi Yan, I'm hoping you might have some input on the below...
>>>
>>>
>>> On 30/04/2024 14:31, Ryan Roberts wrote:
>>>> __split_huge_pmd_locked() can be called for a present THP, devmap or
>>>> (non-present) migration entry. It calls pmdp_invalidate()
>>>> unconditionally on the pmdp and only determines if it is present or not
>>>> based on the returned old pmd.
>>>>
>>>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(),
>>>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future
>>>> pmd_present() calls to return true - even for a swap pmd. Therefore any
>>>> lockless pgtable walker could see the migration entry pmd in this state
>>>> and start interpretting the fields (e.g. pmd_pfn()) as if it were
>>>> present, leading to BadThings (TM). GUP-fast appears to be one such
>>>> lockless pgtable walker.
>>>>
>>>> While the obvious fix is for core-mm to avoid such calls for non-present
>>>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for
>>>> this case either), all other arches that implement pmd_mkinvalid() do it
>>>> in such a way that it is robust to being called with a non-present pmd.
>>>
>>> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below...
>>>
>>>> So it is simpler and safer to make arm64 robust too. This approach means
>>>> we can even add tests to debug_vm_pgtable.c to validate the required
>>>> behaviour.
>>>>
>>>> This is a theoretical bug found during code review. I don't have any
>>>> test case to trigger it in practice.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration")
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>
>>>> Hi all,
>>>>
>>>> v1 of this fix [1] took the approach of fixing core-mm to never call
>>>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64
>>>> suffers this problem; all other arches are robust. So his suggestion was to
>>>> instead make arm64 robust in the same way and add tests to validate it. Despite
>>>> my stated reservations in the context of the v1 discussion, having thought on it
>>>> for a bit, I now agree with Zi Yan. Hence this post.
>>>>
>>>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is
>>>> remove it from there and have this go in through the arm64 tree? Assuming there
>>>> is agreement that this approach is right one.
>>>>
>>>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64.
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>>> arch/arm64/include/asm/pgtable.h | 12 +++++--
>>>> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++
>>>> 2 files changed, 71 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>>> index afdd56d26ad7..7d580271a46d 100644
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>>>
>>>> static inline pmd_t pmd_mkinvalid(pmd_t pmd)
>>>> {
>>>> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>>> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>>> + /*
>>>> + * If not valid then either we are already present-invalid or we are
>>>> + * not-present (i.e. none or swap entry). We must not convert
>>>> + * not-present to present-invalid. Unbelievably, the core-mm may call
>>>> + * pmd_mkinvalid() for a swap entry and all other arches can handle it.
>>>> + */
>>>> + if (pmd_valid(pmd)) {
>>>> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
>>>> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
>>>> + }
>>>>
>>>> return pmd;
>>>> }
>>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>>> index 65c19025da3d..7e9c387d06b0 100644
>>>> --- a/mm/debug_vm_pgtable.c
>>>> +++ b/mm/debug_vm_pgtable.c
>>>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { }
>>>> #endif /* CONFIG_HUGETLB_PAGE */
>>>>
>>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION)
>>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args)
>>>> +{
>>>
>>> Printing various values at different locations in this function for debug:
>>>
>>>> + unsigned long max_swap_offset;
>>>> + swp_entry_t swp_set, swp_clear, swp_convert;
>>>> + pmd_t pmd_set, pmd_clear;
>>>> +
>>>> + /*
>>>> + * See generic_max_swapfile_size(): probe the maximum offset, then
>>>> + * create swap entry will all possible bits set and a swap entry will
>>>> + * all bits clear.
>>>> + */
>>>> + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL))));
>>>> + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset);
>>>> + swp_clear = swp_entry(0, 0);
>>>> +
>>>> + /* Convert to pmd. */
>>>> + pmd_set = swp_entry_to_pmd(swp_set);
>>>> + pmd_clear = swp_entry_to_pmd(swp_clear);
>>>
>>> [ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00
>>>
>>>> +
>>>> + /*
>>>> + * Sanity check that the pmds are not-present, not-huge and swap entry
>>>> + * is recoverable without corruption.
>>>> + */
>>>> + WARN_ON(pmd_present(pmd_set));
>>>> + WARN_ON(pmd_trans_huge(pmd_set));
>>>> + swp_convert = pmd_to_swp_entry(pmd_set);
>>>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>>> + WARN_ON(pmd_present(pmd_clear));
>>>> + WARN_ON(pmd_trans_huge(pmd_clear));
>>>> + swp_convert = pmd_to_swp_entry(pmd_clear);
>>>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>>> +
>>>> + /* Now invalidate the pmd. */
>>>> + pmd_set = pmd_mkinvalid(pmd_set);
>>>> + pmd_clear = pmd_mkinvalid(pmd_clear);
>>>
>>> [ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00
>>>
>>>> +
>>>> + /*
>>>> + * Since its a swap pmd, invalidation should effectively be a noop and
>>>> + * the checks we already did should give the same answer. Check the
>>>> + * invalidation didn't corrupt any fields.
>>>> + */
>>>> + WARN_ON(pmd_present(pmd_set));
>>>> + WARN_ON(pmd_trans_huge(pmd_set));
>>>> + swp_convert = pmd_to_swp_entry(pmd_set);
>>>
>>> [ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff)
>>>
>>>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert));
>>>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert));
>>>> + WARN_ON(pmd_present(pmd_clear));
>>>> + WARN_ON(pmd_trans_huge(pmd_clear));
>>>> + swp_convert = pmd_to_swp_entry(pmd_clear);
>>>
>>> [ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8)
>>>
>>>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert));
>>>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert));
>>>
>>> This line fails on x86_64.
>>>
>>> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8.
>>>
>>> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things.
>>>
>>> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes.
>
> If x86_64's pmd_mkinvalid() also corrupts swap entries, yes, your original fix
> is better. I will dig into the x86 code more to figure out what goes wrong.
> Last time, I only checked PAGE_* bits in these pmd|pte_* operations.
> Sorry for the misinformation.
No worries, I'll do the amends we originally agreed for the original fix and resend.
>
>>>
>>>> +}
>>>> +#else
>>>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { }
>>>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */
>>>> +
>>>> static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>>> {
>>>> pmd_t pmd;
>>>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args)
>>>> WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>>> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd))));
>>>> #endif /* __HAVE_ARCH_PMDP_INVALIDATE */
>>>> +
>>>> + swp_pmd_mkinvalid_tests(args);
>>>> }
>>>>
>>>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>>> --
>>>> 2.25.1
>>>>
>>>
>
>
> --
> Best Regards,
> Yan, Zi
prev parent reply other threads:[~2024-05-01 12:58 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 13:31 [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Ryan Roberts
2024-04-30 13:55 ` Will Deacon
2024-04-30 14:04 ` Ryan Roberts
2024-04-30 16:23 ` Catalin Marinas
2024-04-30 16:25 ` Ryan Roberts
2024-04-30 15:00 ` Zi Yan
2024-04-30 17:57 ` Catalin Marinas
2024-05-01 8:05 ` Ryan Roberts
2024-05-01 10:04 ` Catalin Marinas
2024-05-01 10:13 ` Ryan Roberts
2024-05-02 18:00 ` Catalin Marinas
2024-05-01 11:35 ` Ryan Roberts
2024-05-01 11:38 ` Ryan Roberts
2024-05-01 12:07 ` Zi Yan
2024-05-01 12:58 ` Ryan Roberts [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2a1b4275-39c5-4c9e-830f-cde16e81c12c@arm.com \
--to=ryan.roberts@arm.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).