LKML Archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Guillaume Morin <guillaume@morinfr.org>
Cc: oleg@redhat.com, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org, muchun.song@linux.dev
Subject: Re: [RFC][PATCH] uprobe: support for private hugetlb mappings
Date: Mon, 22 Apr 2024 20:59:18 +0200	[thread overview]
Message-ID: <22fcde31-16c4-42d0-ad99-568173ec4dd0@redhat.com> (raw)
In-Reply-To: <ZiaoZlGc_8ZV3736@bender.morinfr.org>

On 22.04.24 20:11, Guillaume Morin wrote:
> (Dropping Mike Kravetz as CC since he has retired and his email is no
> longer valid, adding Muchun since he's the current hugetlb maintainer,
> as well as linux-trace-kernel)
> 
> On 22 Apr 11:39, David Hildenbrand wrote:
>>
>> On 19.04.24 20:37, Guillaume Morin wrote:
>>> libhugetlbfs, the Intel iodlr code both allow to remap .text onto a
>>> hugetlb private mapping. It's also pretty easy to do it manually.
>>> One drawback of using this functionality is the lack of support for
>>> uprobes (NOTE uprobe ignores shareable vmas)
>>>
>>> This patch adds support for private hugetlb mappings.  It does require exposing
>>> some hugetlbfs innards and relies on copy_user_large_folio which is only
>>> available when CONFIG_HUGETLBFS is used so I had to use an ugly #ifdef
>>>
>>> If there is some interest in applying this patch in some form or
>>> another, I am open to any refactoring suggestions (esp getting rid the
>>> #ifdef in uprobes.c) . I tried to limit the
>>> amount of branching.
>>
>> All that hugetlb special casing .... oh my. What's the benefit why we should
>> be interested in making that code less clean -- to phrase it in a nice way
>> ;) ?
> 
> I do appreciate the nice phrasing. Believe me, I did try to limit the
> special casing to a minimum :-).
> 
> Outside of __replace_page, I added only 3-ish branches so I do not think
> it's *too* bad. The uprobe code is using PAGE_{SHIFT,MASK} quite liberally so I
> had to add calls to retrieve these for the hugetlb vmas.
> 
> __replace_page has a lot of special casing. I certainly agree (and
> unfortunately for me it's at the beginning of the patch :)).  It's doing
> something pretty uncommon outside of the mm code so it has to make a
> bunch of specific hugetlb calls. I am not quite sure how to improve it
> but if you have suggestions, I'd be happy to refactor.

See below.

> 
> The benefit - to me - is very clear. People do use hugetlb mappings to
> run code in production environments. The perf benefits are there for some
> workloads. Intel has published a whitepaper about it etc.
> Uprobes are a very good tool to do live tracing. If you can restart the
> process and reproduce, you should be able to disable hugetlb remapping
> but if you need to look at a live process, there are not many options.
> Not being able to use uprobes is crippling.

Please add all that as motivation to the patch description or cover letter.

> 
>> Yes, libhugetlbfs exists. But why do we have to support uprobes with it?
>> Nobody cared until now, why care now?
> 
> I think you could ask the same question for every new feature patch :)

I have to, because it usually indicates a lack of motivation in the 
cover-letter/patch description :P

People will have to maintain that code, and maintaining hugetlb code in 
odd places is no fun ...

> 
> Since the removal a few releases ago of the __morecore() hook in glibc,
> the main feature of libhugetlbfs is ELF segments remapping. I think
> there are definitely a lot of users that simply deal with this
> unnecessary limitation.
> 
> I am certainly not shoving this patch through anyone's throat if there
> is no interest. But we definitely find it a very useful feature ...

Let me try to see if we can get this done cleaner.

One ugly part (in general here) is the custom page replacement in the 
registration part.

We are guaranteed to have a MAP_PRIVATE mapping. Instead of replacing 
pages ourselves (which we likely shouldn't do ...) ... maybe we could 
use FAULT_FLAG_UNSHARE faults such that we will get an anonymous folio 
populated. (like KSM does nowadays)

Punching FOLL_PIN|FOLL_LONGTERM into GUP would achieve the same thing, 
but using FOLL_WRITE would not work on many file systems. So maybe we 
have to trigger an unsharing fault ourselves.

That would do the page replacement for us and we "should" be able to 
lookup an anonymous folio that we can then just modify, like ptrace would.

But then, there is also unregistration part, with weird conditional page 
replacement. Zapping the anon page if the content matches the content of 
the original page is one thing. But why are we placing an existing 
anonymous page by a new anonymous page when the content from the 
original page differs (but matches the one from the just copied page?)?

I'll have to further think about that one. It's all a bit nasty.


One thing to note is that hugetlb folios don't grow on trees. Likely, 
Many setups *don't* reserve extra hugetlb folios and you might just 
easily be running out of free hugetlb folios that you can use to break 
COW here (replace a file hugetlb by a fresh anon hugetlb page). Likely 
it's easy to make register or unregister fail.

-- 
Cheers,

David / dhildenb


  reply	other threads:[~2024-04-22 18:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19 18:37 [RFC][PATCH] uprobe: support for private hugetlb mappings Guillaume Morin
2024-04-22  9:39 ` David Hildenbrand
2024-04-22 18:11   ` Guillaume Morin
2024-04-22 18:59     ` David Hildenbrand [this message]
2024-04-22 20:53       ` Guillaume Morin
2024-04-24 20:09         ` David Hildenbrand
2024-04-24 20:44           ` Guillaume Morin
2024-04-24 21:00             ` David Hildenbrand
2024-04-25 15:19               ` Guillaume Morin
2024-04-25 15:42                 ` David Hildenbrand
2024-04-25 19:56                 ` David Hildenbrand
2024-04-26  0:09                   ` Guillaume Morin
2024-04-26  7:19                     ` David Hildenbrand
2024-04-26 19:55                       ` Guillaume Morin
2024-04-30 15:22                         ` Guillaume Morin
2024-04-30 18:21                           ` David Hildenbrand
2024-04-30 18:58                             ` Guillaume Morin
2024-04-30 19:25                         ` David Hildenbrand
2024-05-02  3:59                           ` Guillaume Morin
2024-05-16 17:44                             ` Guillaume Morin
2024-05-16 19:52                               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22fcde31-16c4-42d0-ad99-568173ec4dd0@redhat.com \
    --to=david@redhat.com \
    --cc=guillaume@morinfr.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).