All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Guillaume Morin <guillaume@morinfr.org>
To: David Hildenbrand <david@redhat.com>
Cc: Guillaume Morin <guillaume@morinfr.org>,
	oleg@redhat.com, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org, muchun.song@linux.dev
Subject: Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
Date: Fri, 26 Apr 2024 02:09:54 +0200	[thread overview]
Message-ID: <Zirw0uINbP6GxFiK@bender.morinfr.org> (raw)
In-Reply-To: <e84a82b8-b788-499c-be79-e6dcb64ac969@redhat.com>

On 25 Apr 21:56, David Hildenbrand wrote:
>
> On 25.04.24 17:19, Guillaume Morin wrote:
> > On 24 Apr 23:00, David Hildenbrand wrote:
> > > > One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for
> > > > hugetlb mappings. However this was also on my TODO and I have a draft
> > > > patch that implements it.
> > > 
> > > Yes, I documented it back then and added sanity checks in GUP code to fence
> > > it off. Shouldn't be too hard to implement (famous last words) and would be
> > > the cleaner thing to use here once I manage to switch over to
> > > FOLL_WRITE|FOLL_FORCE to break COW.
> > 
> > Yes, my patch seems to be working. The hugetlb code is pretty simple.
> > And it allows ptrace and the proc pid mem file to work on the executable
> > private hugetlb mappings.
> > 
> > There is one thing I am unclear about though. hugetlb enforces that
> > huge_pte_write() is true on FOLL_WRITE in both the fault and
> > follow_page_mask paths. I am not sure if we can simply assume in the
> > hugetlb code that if the pte is not writable and this is a write fault
> > then we're in the FOLL_FORCE|FOLL_WRITE case.  Or do we want to keep the
> > checks simply not enforce it for FOLL_FORCE|FOLL_WRITE?
> > 
> > The latter is more complicated in the fault path because there is no
> > FAULT_FLAG_FORCE flag.
> > 
> 
> I just pushed something to
> 	https://github.com/davidhildenbrand/linux/tree/uprobes_cow
> 
> Only very lightly tested so far. Expect the worst :)


I'll try it out and send you the hugetlb bits

> 
> I still detest having the zapping logic there, but to get it all right I
> don't see a clean way around that.
> 
> 
> For hugetlb, we'd primarily have to implement the
> mm_walk_ops->hugetlb_entry() callback (well, and FOLL_FORCE).

For FOLL_FORCE, heer is my draft. Let me know if this is what you had in
mind. 


diff --git a/mm/gup.c b/mm/gup.c
index 1611e73b1121..ac60e0ae64e8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1056,9 +1056,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
 		if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
 			if (!(gup_flags & FOLL_FORCE))
 				return -EFAULT;
-			/* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
-			if (is_vm_hugetlb_page(vma))
-				return -EFAULT;
 			/*
 			 * We used to let the write,force case do COW in a
 			 * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3548eae42cf9..73f86eddf888 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5941,7 +5941,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 		       struct folio *pagecache_folio, spinlock_t *ptl,
 		       struct vm_fault *vmf)
 {
-	const bool unshare = flags & FAULT_FLAG_UNSHARE;
+	const bool make_writable = !(flags & FAULT_FLAG_UNSHARE) &&
+		(vma->vm_flags & VM_WRITE);
 	pte_t pte = huge_ptep_get(ptep);
 	struct hstate *h = hstate_vma(vma);
 	struct folio *old_folio;
@@ -5959,16 +5960,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * can trigger this, because hugetlb_fault() will always resolve
 	 * uffd-wp bit first.
 	 */
-	if (!unshare && huge_pte_uffd_wp(pte))
+	if (make_writable && huge_pte_uffd_wp(pte))
 		return 0;
 
-	/*
-	 * hugetlb does not support FOLL_FORCE-style write faults that keep the
-	 * PTE mapped R/O such as maybe_mkwrite() would do.
-	 */
-	if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE)))
-		return VM_FAULT_SIGSEGV;
-
 	/* Let's take out MAP_SHARED mappings first. */
 	if (vma->vm_flags & VM_MAYSHARE) {
 		set_huge_ptep_writable(vma, haddr, ptep);
@@ -5989,7 +5983,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 			folio_move_anon_rmap(old_folio, vma);
 			SetPageAnonExclusive(&old_folio->page);
 		}
-		if (likely(!unshare))
+		if (likely(make_writable))
 			set_huge_ptep_writable(vma, haddr, ptep);
 
 		delayacct_wpcopy_end();
@@ -6094,7 +6088,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 	spin_lock(ptl);
 	ptep = hugetlb_walk(vma, haddr, huge_page_size(h));
 	if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) {
-		pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare);
+		pte_t newpte = make_huge_pte(vma, &new_folio->page,
+					     make_writable);
 
 		/* Break COW or unshare */
 		huge_ptep_clear_flush(vma, haddr, ptep);
@@ -6883,6 +6878,17 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
 }
 #endif /* CONFIG_USERFAULTFD */
 
+static bool is_force_follow(struct vm_area_struct* vma, unsigned int flags,
+			     struct page* page) {
+	if (vma->vm_flags & VM_WRITE)
+		return false;
+
+	if (!(flags & FOLL_FORCE))
+		return false;
+
+	return page && PageAnon(page) && page_mapcount(page) == 1;
+}
+
 struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 				      unsigned long address, unsigned int flags,
 				      unsigned int *page_mask)
@@ -6907,11 +6913,11 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 
 		if (!huge_pte_write(entry)) {
 			if (flags & FOLL_WRITE) {
-				page = NULL;
-				goto out;
-			}
-
-			if (gup_must_unshare(vma, flags, page)) {
+				if (!is_force_follow(vma, flags, page)) {
+					page = NULL;
+					goto out;
+				}
+			} else if (gup_must_unshare(vma, flags, page)) {
 				/* Tell the caller to do unsharing */
 				page = ERR_PTR(-EMLINK);
 				goto out;

> 
> Likely vaddr and PAGE_SIZE in uprobe_write_opcode() would have to be
> expanded to cover the full hugetlb page.
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

-- 
Guillaume Morin <guillaume@morinfr.org>

  reply	other threads:[~2024-04-26  0:10 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19 18:37 [RFC][PATCH] uprobe: support for private hugetlb mappings Guillaume Morin
2024-04-22  9:39 ` David Hildenbrand
2024-04-22 18:11   ` Guillaume Morin
2024-04-22 18:59     ` David Hildenbrand
2024-04-22 20:53       ` Guillaume Morin
2024-04-24 20:09         ` David Hildenbrand
2024-04-24 20:44           ` Guillaume Morin
2024-04-24 21:00             ` David Hildenbrand
2024-04-25 15:19               ` Guillaume Morin
2024-04-25 15:42                 ` David Hildenbrand
2024-04-25 19:56                 ` David Hildenbrand
2024-04-26  0:09                   ` Guillaume Morin [this message]
2024-04-26  7:19                     ` David Hildenbrand
2024-04-26 19:55                       ` Guillaume Morin
2024-04-30 15:22                         ` Guillaume Morin
2024-04-30 18:21                           ` David Hildenbrand
2024-04-30 18:58                             ` Guillaume Morin
2024-04-30 19:25                         ` David Hildenbrand
2024-05-02  3:59                           ` Guillaume Morin
2024-05-16 17:44                             ` Guillaume Morin
2024-05-16 19:52                               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zirw0uINbP6GxFiK@bender.morinfr.org \
    --to=guillaume@morinfr.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.