Re: [RFC][PATCH] uprobe: support for private hugetlb mappings

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Guillaume Morin <guillaume@morinfr.org>
To: David Hildenbrand <david@redhat.com>
Cc: Guillaume Morin <guillaume@morinfr.org>,
	oleg@redhat.com, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org, muchun.song@linux.dev
Subject: Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
Date: Tue, 30 Apr 2024 20:58:31 +0200	[thread overview]
Message-ID: <ZjE_V0j8mYpNaW8K@bender.morinfr.org> (raw)
In-Reply-To: <35c88b38-96e6-4092-a7e0-fd22f6d784cc@redhat.com>

On 30 Apr 20:21, David Hildenbrand wrote:
> Sorry for not replying earlier, was busy with other stuff. I'll try getiing
> that stuff into shape and send it out soonish.

No worries. Let me know what you think of the FOLL_FORCE patch when you
have a sec.

> > I went with using one write uprobe function with some additional
> > branches. I went back and forth between that and making them 2 different
> > functions.
> 
> All the folio_test_hugetlb() special casing is a bit suboptimal. Likely we
> want a separate variant, because we should be sing hugetlb PTE functions
> consistently (e.g., huge_pte_uffd_wp() vs pte_uffd_wp(), softdirty does not
> exist etc.)

Ok, I'll give this a whirl and send something probably tomorrow.

> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index 2f4e88552d3f..8a33e380f7ea 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -83,6 +83,10 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = {
> >   	{}
> >   };
> > +bool hugetlbfs_mapping(struct address_space *mapping) {
> > +	return mapping->a_ops == &hugetlbfs_aops;
> 
> is_vm_hugetlb_page() might be what you are looking for.

I use hugetlbfs_mapping() in __uprobe_register() which does an early
return using the mapping only if it's not supported. There is no vma
that I can get at this point (afaict).

I could refactor so we check this when we have a vma but it looked
cleaner to introduce it since there is already shmem_mapping()

> >   }
> > -static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
> > +static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len, unsigned long page_mask)
> >   {
> >   	void *kaddr = kmap_atomic(page);
> > -	memcpy(dst, kaddr + (vaddr & ~PAGE_MASK), len);
> > +	memcpy(dst, kaddr + (vaddr & ~page_mask), len);
> >   	kunmap_atomic(kaddr);
> >   }
> 
> > -static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
> > +static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len, unsigned long page_mask)
> >   {
> >   	void *kaddr = kmap_atomic(page);
> > -	memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
> > +	memcpy(kaddr + (vaddr & ~page_mask), src, len);
> >   	kunmap_atomic(kaddr);
> >   }
> 
> These two changes really are rather ugly ...
> 
> An why are they even required? We get a PAGE_SIZED-based subpage of a
> hugetlb page. We only kmap that one and copy within that one.
> 
> In other words, I don't think the copy_from_page() and copy_to_page()
> changes are even required when we consistently work on subpages and not
> suddenly on head pages.

The main reason is that the previous __replace_page worked directly on the full
HP page so adjusting after gup seemed to make more sense to me. But
now I guess it's not that useful (esp we're going with a different
version of write_uprobe). I'll fix

(...)
> >   {
> >   	struct uwo_data *data = walk->private;;
> >   	const bool is_register = !!is_swbp_insn(&data->opcode);
> > @@ -415,9 +417,12 @@ static int __write_opcode_pte(pte_t *ptep, unsigned long vaddr,
> >   	/* Unmap + flush the TLB, such that we can write atomically .*/
> >   	flush_cache_page(vma, vaddr, pte_pfn(pte));
> > -	pte = ptep_clear_flush(vma, vaddr, ptep);
> > +	if (folio_test_hugetlb(folio))
> > +		pte = huge_ptep_clear_flush(vma, vaddr, ptep);
> > +	else
> > +		pte = ptep_clear_flush(vma, vaddr, ptep);
> >   	copy_to_page(page, data->opcode_vaddr, &data->opcode,
> > -		     UPROBE_SWBP_INSN_SIZE);
> > +		     UPROBE_SWBP_INSN_SIZE, page_mask);
> >   	/* When unregistering, we may only zap a PTE if uffd is disabled ... */
> >   	if (is_register || userfaultfd_missing(vma))
> > @@ -443,13 +448,18 @@ static int __write_opcode_pte(pte_t *ptep, unsigned long vaddr,
> >   	if (!identical || folio_maybe_dma_pinned(folio))
> >   		goto remap;
> > -	/* Zap it and try to reclaim swap space. */
> > -	dec_mm_counter(mm, MM_ANONPAGES);
> > -	folio_remove_rmap_pte(folio, page, vma);
> > -	if (!folio_mapped(folio) && folio_test_swapcache(folio) &&
> > -	     folio_trylock(folio)) {
> > -		folio_free_swap(folio);
> > -		folio_unlock(folio);
> > +	if (folio_test_hugetlb(folio)) {
> > +		hugetlb_remove_rmap(folio);
> > +		large = false;
> > +	} else {
> > +		/* Zap it and try to reclaim swap space. */
> > +		dec_mm_counter(mm, MM_ANONPAGES);
> > +		folio_remove_rmap_pte(folio, page, vma);
> > +		if (!folio_mapped(folio) && folio_test_swapcache(folio) &&
> > +		folio_trylock(folio)) {
> > +			folio_free_swap(folio);
> > +			folio_unlock(folio);
> > +		}
> >   	}
> >   	folio_put(folio);
> > @@ -461,11 +471,29 @@ static int __write_opcode_pte(pte_t *ptep, unsigned long vaddr,
> >   	 */
> >   	smp_wmb();
> >   	/* We modified the page. Make sure to mark the PTE dirty. */
> > -	set_pte_at(mm, vaddr, ptep, pte_mkdirty(pte));
> > +	if (folio_test_hugetlb(folio))
> > +		set_huge_pte_at(mm , vaddr, ptep, huge_pte_mkdirty(pte),
> > +				(~page_mask) + 1);
> > +	else
> > +		set_pte_at(mm, vaddr, ptep, pte_mkdirty(pte));
> >   	return UWO_DONE;
> >   }
> > +static int __write_opcode_hugetlb(pte_t *ptep, unsigned long hmask,
> > +		unsigned long vaddr,
> > +		unsigned long next, struct mm_walk *walk)
> > +{
> > +	return __write_opcode(ptep, vaddr, hmask, walk);
> > +}
> > +
> > +static int __write_opcode_pte(pte_t *ptep, unsigned long vaddr,
> > +		unsigned long next, struct mm_walk *walk)
> > +{
> > +	return __write_opcode(ptep, vaddr, PAGE_MASK, walk);
> > +}
> > +
> >   static const struct mm_walk_ops write_opcode_ops = {
> > +	.hugetlb_entry		= __write_opcode_hugetlb,
> >   	.pte_entry		= __write_opcode_pte,
> >   	.walk_lock		= PGWALK_WRLOCK,
> >   };
> > @@ -492,7 +520,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> >   		unsigned long opcode_vaddr, uprobe_opcode_t opcode)
> >   {
> >   	struct uprobe *uprobe = container_of(auprobe, struct uprobe, arch);
> > -	const unsigned long vaddr = opcode_vaddr & PAGE_MASK;
> > +	unsigned long vaddr = opcode_vaddr & PAGE_MASK;
> >   	const bool is_register = !!is_swbp_insn(&opcode);
> >   	struct uwo_data data = {
> >   		.opcode = opcode,
> > @@ -503,6 +531,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> >   	struct mmu_notifier_range range;
> >   	int ret, ref_ctr_updated = 0;
> >   	struct page *page;
> > +	unsigned long page_size = PAGE_SIZE;
> >   	if (WARN_ON_ONCE(!is_cow_mapping(vma->vm_flags)))
> >   		return -EINVAL;
> > @@ -521,7 +550,14 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> >   	if (ret != 1)
> >   		goto out;
> > -	ret = verify_opcode(page, opcode_vaddr, &opcode);
> > +
> > +	if (is_vm_hugetlb_page(vma)) {
> > +		struct hstate *h = hstate_vma(vma);
> > +		page_size = huge_page_size(h);
> > +		vaddr &= huge_page_mask(h);
> > +		page = compound_head(page);
> 
> I think we should only adjust the range we pass to the mmu notifier and for
> walking the VMA range. But we should not adjust vaddr.
> 
> Further, we should not adjust the page if possible ... ideally, we'll treat
> hugetlb folios just like large folios here and operate on subpages.
> 
> Inside __write_opcode(), we can derive the the page of interest from
> data->opcode_vaddr.

Here you mean __write_opcode_hugetlb(), right? Since we're going with
the 2 independent variants. Just want to 100% sure I am following

> find_get_page() might need some though, if it won't return a subpage of a
> hugetlb folio. Should be solvable by a wrapper, though.

We can zero out the subbits with the huge page mask in the
vaddr_to_offset() in the hugetlb variant like I do in __copy_insn() and
that should work, no? Or you prefer a wrapper?

Guillaume.

-- 
Guillaume Morin <guillaume@morinfr.org>

next prev parent reply	other threads:[~2024-04-30 18:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19 18:37 [RFC][PATCH] uprobe: support for private hugetlb mappings Guillaume Morin
2024-04-22  9:39 ` David Hildenbrand
2024-04-22 18:11   ` Guillaume Morin
2024-04-22 18:59     ` David Hildenbrand
2024-04-22 20:53       ` Guillaume Morin
2024-04-24 20:09         ` David Hildenbrand
2024-04-24 20:44           ` Guillaume Morin
2024-04-24 21:00             ` David Hildenbrand
2024-04-25 15:19               ` Guillaume Morin
2024-04-25 15:42                 ` David Hildenbrand
2024-04-25 19:56                 ` David Hildenbrand
2024-04-26  0:09                   ` Guillaume Morin
2024-04-26  7:19                     ` David Hildenbrand
2024-04-26 19:55                       ` Guillaume Morin
2024-04-30 15:22                         ` Guillaume Morin
2024-04-30 18:21                           ` David Hildenbrand
2024-04-30 18:58                             ` Guillaume Morin [this message]
2024-04-30 19:25                         ` David Hildenbrand
2024-05-02  3:59                           ` Guillaume Morin
2024-05-16 17:44                             ` Guillaume Morin
2024-05-16 19:52                               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjE_V0j8mYpNaW8K@bender.morinfr.org \
    --to=guillaume@morinfr.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.