Linux-arch Archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: Gregory Price <gourry.memverge@gmail.com>,  <linux-mm@kvack.org>,
	<linux-api@vger.kernel.org>,  <linux-arch@vger.kernel.org>,
	<linux-kselftest@vger.kernel.org>,
	 <linux-kernel@vger.kernel.org>, <dan.j.williams@intel.com>,
	 <honggyu.kim@sk.com>,  <corbet@lwn.net>, <arnd@arndb.de>,
	 <luto@kernel.org>,  <akpm@linux-foundation.org>,
	<shuah@kernel.org>
Subject: Re: [RFC v3 0/3] move_phys_pages syscall - migrate page contents given
Date: Wed, 20 Mar 2024 14:01:17 +0800	[thread overview]
Message-ID: <87r0g5saqa.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <Zfpohg3EGxxOEcWg@memverge.com> (Gregory Price's message of "Wed, 20 Mar 2024 00:39:34 -0400")

Gregory Price <gregory.price@memverge.com> writes:

> On Wed, Mar 20, 2024 at 10:48:44AM +0800, Huang, Ying wrote:
>> Gregory Price <gourry.memverge@gmail.com> writes:
>> 
>> > Doing this reverse-translation outside of the kernel requires considerable
>> > space and compute, and it will have to be performed again by the existing
>> > system calls.  Much of this work can be avoided if the pages can be
>> > migrated directly with physical memory addressing.
>> 
>> One difficulty of the idea of the physical address is that we lacks some
>> user space specified policy information to make decision.  For example,
>> users may want to pin some pages in DRAM to improve latency, or pin some
>> pages in CXL memory to do some best effort work.  To make the correct
>> decision, we need PID and virtual address.
>> 
>
> I think of this as a second or third order problem.  The core problem
> right now isn't the practicality of how userland would actually use this
> interface - the core problem is whether the data generated by offloaded
> monitoring is even worth collecting and operating on in the first place.  
>
> So this is a quick hack to do some research about whether it's even
> worth developing the whole abstraction described by Willy.
>
> This is why it's labeled RFC.  I upped a v3 because I know of two groups
> actively looking at using it for research, and because the folio updates
> broke the old version.  It's also easier for me to engage through the
> list than via private channels for this particular work.
>
>
> Do I suggest we merge this interface as-is? No, too many concerns about
> side channels.  However, it's a clean reuse of move_pages code to
> bootstrap the investigation, and it at least gets the gears turning.

Got it!  Thanks for detailed explanation.

I think that one of the difficulties of offloaded monitoring is that
it's hard to obey these user specified policies.  The policies may
become more complex in the future, for example, allocate DRAM among
workloads.

> Example notes from a sidebar earlier today:
>
> * An interesting proposal from Dan Williams would be to provide some
>   sort of `/sys/.../memory_tiering/tierN/promote_hot` interface, with
>   a callback mechanism into the relevant hardware drivers that allows
>   for this to be abstracted.  This could be done on some interval and
>   some threshhold (# pages, hotness threshhold, etc).
>
>
> The code to execute promotions ends up looking like what I have now
>
> 1) Validate the page is elgibile to be promoted by walking the vmas
> 2) invoking the existing move_pages code
>
> The above idea can be implemented trivially in userland without
> having to plumb through a whole brand new callback system.
>
>
> Sometimes you have to post stupid ideas to get to the good ones :]
>

--
Best Regards,
Huang, Ying

      reply	other threads:[~2024-03-20  6:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-19 17:26 [RFC v3 0/3] move_phys_pages syscall - migrate page contents given Gregory Price
2024-03-19 17:26 ` [RFC v3 1/3] mm/migrate: refactor add_page_for_migration for code re-use Gregory Price
2024-03-19 17:26 ` [RFC v3 2/3] mm/migrate: Create move_phys_pages syscall Gregory Price
2024-03-19 17:26 ` [RFC v3 3/3] ktest: sys_move_phys_pages ktest Gregory Price
2024-03-19 17:52   ` Matthew Wilcox
2024-03-19 18:08     ` Matthew Wilcox
2024-03-19 18:16       ` [RFC v3 3/3] ktest: sys_move_phys_pages ktesty Gregory Price
2024-03-19 18:18         ` Gregory Price
2024-03-19 18:14     ` [RFC v3 3/3] ktest: sys_move_phys_pages ktest Gregory Price
2024-03-19 18:20       ` Matthew Wilcox
2024-03-19 18:32         ` Gregory Price
2024-03-19 18:38           ` Matthew Wilcox
2024-03-19 18:50             ` Gregory Price
2024-03-20  2:48 ` [RFC v3 0/3] move_phys_pages syscall - migrate page contents given Huang, Ying
2024-03-20  4:39   ` Gregory Price
2024-03-20  6:01     ` Huang, Ying [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0g5saqa.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=honggyu.kim@sk.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).