All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Guillaume Morin <guillaume@morinfr.org>
Cc: oleg@redhat.com, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org, muchun.song@linux.dev
Subject: Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
Date: Fri, 26 Apr 2024 09:19:13 +0200	[thread overview]
Message-ID: <385d3516-95bb-4ff9-9d60-ac4e46104130@redhat.com> (raw)
In-Reply-To: <Zirw0uINbP6GxFiK@bender.morinfr.org>

On 26.04.24 02:09, Guillaume Morin wrote:
> On 25 Apr 21:56, David Hildenbrand wrote:
>>
>> On 25.04.24 17:19, Guillaume Morin wrote:
>>> On 24 Apr 23:00, David Hildenbrand wrote:
>>>>> One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for
>>>>> hugetlb mappings. However this was also on my TODO and I have a draft
>>>>> patch that implements it.
>>>>
>>>> Yes, I documented it back then and added sanity checks in GUP code to fence
>>>> it off. Shouldn't be too hard to implement (famous last words) and would be
>>>> the cleaner thing to use here once I manage to switch over to
>>>> FOLL_WRITE|FOLL_FORCE to break COW.
>>>
>>> Yes, my patch seems to be working. The hugetlb code is pretty simple.
>>> And it allows ptrace and the proc pid mem file to work on the executable
>>> private hugetlb mappings.
>>>
>>> There is one thing I am unclear about though. hugetlb enforces that
>>> huge_pte_write() is true on FOLL_WRITE in both the fault and
>>> follow_page_mask paths. I am not sure if we can simply assume in the
>>> hugetlb code that if the pte is not writable and this is a write fault
>>> then we're in the FOLL_FORCE|FOLL_WRITE case.  Or do we want to keep the
>>> checks simply not enforce it for FOLL_FORCE|FOLL_WRITE?
>>>
>>> The latter is more complicated in the fault path because there is no
>>> FAULT_FLAG_FORCE flag.
>>>
>>
>> I just pushed something to
>> 	https://github.com/davidhildenbrand/linux/tree/uprobes_cow
>>
>> Only very lightly tested so far. Expect the worst :)
> 
> 
> I'll try it out and send you the hugetlb bits
> 
>>
>> I still detest having the zapping logic there, but to get it all right I
>> don't see a clean way around that.
>>
>>
>> For hugetlb, we'd primarily have to implement the
>> mm_walk_ops->hugetlb_entry() callback (well, and FOLL_FORCE).
> 
> For FOLL_FORCE, heer is my draft. Let me know if this is what you had in
> mind.
> 
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 1611e73b1121..ac60e0ae64e8 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1056,9 +1056,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>   		if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
>   			if (!(gup_flags & FOLL_FORCE))
>   				return -EFAULT;
> -			/* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
> -			if (is_vm_hugetlb_page(vma))
> -				return -EFAULT;
>   			/*
>   			 * We used to let the write,force case do COW in a
>   			 * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 3548eae42cf9..73f86eddf888 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5941,7 +5941,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>   		       struct folio *pagecache_folio, spinlock_t *ptl,
>   		       struct vm_fault *vmf)
>   {
> -	const bool unshare = flags & FAULT_FLAG_UNSHARE;
> +	const bool make_writable = !(flags & FAULT_FLAG_UNSHARE) &&
> +		(vma->vm_flags & VM_WRITE);
>   	pte_t pte = huge_ptep_get(ptep);
>   	struct hstate *h = hstate_vma(vma);
>   	struct folio *old_folio;
> @@ -5959,16 +5960,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>   	 * can trigger this, because hugetlb_fault() will always resolve
>   	 * uffd-wp bit first.
>   	 */
> -	if (!unshare && huge_pte_uffd_wp(pte))
> +	if (make_writable && huge_pte_uffd_wp(pte))
>   		return 0;
>   
> -	/*
> -	 * hugetlb does not support FOLL_FORCE-style write faults that keep the
> -	 * PTE mapped R/O such as maybe_mkwrite() would do.
> -	 */
> -	if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE)))
> -		return VM_FAULT_SIGSEGV;
> -
>   	/* Let's take out MAP_SHARED mappings first. */
>   	if (vma->vm_flags & VM_MAYSHARE) {
>   		set_huge_ptep_writable(vma, haddr, ptep);
> @@ -5989,7 +5983,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>   			folio_move_anon_rmap(old_folio, vma);
>   			SetPageAnonExclusive(&old_folio->page);
>   		}
> -		if (likely(!unshare))
> +		if (likely(make_writable))
>   			set_huge_ptep_writable(vma, haddr, ptep);

Maybe we want to refactor that similarly into a 
set_huge_ptep_maybe_writable, and handle the VM_WRITE check internally.

Then, here you'd do

if (unshare)
	set_huge_ptep(vma, haddr, ptep);
else
	set_huge_ptep_maybe_writable(vma, haddr, ptep);

Something like that.



>   		/* Break COW or unshare */
>   		huge_ptep_clear_flush(vma, haddr, ptep);
> @@ -6883,6 +6878,17 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
>   }
>   #endif /* CONFIG_USERFAULTFD */
>   
> +static bool is_force_follow(struct vm_area_struct* vma, unsigned int flags,
> +			     struct page* page) {
> +	if (vma->vm_flags & VM_WRITE)
> +		return false;
> +
> +	if (!(flags & FOLL_FORCE))
> +		return false;
> +
> +	return page && PageAnon(page) && page_mapcount(page) == 1;
> +}

A couple of points:

a) Don't use page_mapcount(). Either folio_mapcount(), but likely you 
want to check PageAnonExclusive.

b) If you're not following the can_follow_write_pte/_pmd model, you are 
doing something wrong :)

c) The code was heavily changed in mm/mm-unstable. It was merged with t
the common code.

Likely, in mm/mm-unstable, the existing can_follow_write_pte and 
can_follow_write_pmd checks will already cover what you want in most cases.

We'd need a can_follow_write_pud() to cover follow_huge_pud() and 
(unfortunately) something to handle follow_hugepd() as well similarly.

Copy-pasting what we do in can_follow_write_pte() and adjusting for 
different PTE types is the right thing to do. Maybe now it's time to 
factor out the common checks into a separate helper.


-- 
Cheers,

David / dhildenb


  reply	other threads:[~2024-04-26  7:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19 18:37 [RFC][PATCH] uprobe: support for private hugetlb mappings Guillaume Morin
2024-04-22  9:39 ` David Hildenbrand
2024-04-22 18:11   ` Guillaume Morin
2024-04-22 18:59     ` David Hildenbrand
2024-04-22 20:53       ` Guillaume Morin
2024-04-24 20:09         ` David Hildenbrand
2024-04-24 20:44           ` Guillaume Morin
2024-04-24 21:00             ` David Hildenbrand
2024-04-25 15:19               ` Guillaume Morin
2024-04-25 15:42                 ` David Hildenbrand
2024-04-25 19:56                 ` David Hildenbrand
2024-04-26  0:09                   ` Guillaume Morin
2024-04-26  7:19                     ` David Hildenbrand [this message]
2024-04-26 19:55                       ` Guillaume Morin
2024-04-30 15:22                         ` Guillaume Morin
2024-04-30 18:21                           ` David Hildenbrand
2024-04-30 18:58                             ` Guillaume Morin
2024-04-30 19:25                         ` David Hildenbrand
2024-05-02  3:59                           ` Guillaume Morin
2024-05-16 17:44                             ` Guillaume Morin
2024-05-16 19:52                               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=385d3516-95bb-4ff9-9d60-ac4e46104130@redhat.com \
    --to=david@redhat.com \
    --cc=guillaume@morinfr.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.