KVM ARM Archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Will Deacon <will@kernel.org>
Cc: David Hildenbrand <david@redhat.com>,
	Vishal Annapurve <vannapurve@google.com>,
	 Quentin Perret <qperret@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Fuad Tabba <tabba@google.com>,
	 kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	pbonzini@redhat.com,  chenhuacai@kernel.org, mpe@ellerman.id.au,
	anup@brainfault.org,  paul.walmsley@sifive.com,
	palmer@dabbelt.com, aou@eecs.berkeley.edu,
	 viro@zeniv.linux.org.uk, brauner@kernel.org,
	akpm@linux-foundation.org,  xiaoyao.li@intel.com,
	yilun.xu@intel.com, chao.p.peng@linux.intel.com,
	 jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
	 yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
	mic@digikod.net,  vbabka@suse.cz, ackerleytng@google.com,
	mail@maciej.szmigiero.name,  michael.roth@amd.com,
	wei.w.wang@intel.com, liam.merwick@oracle.com,
	 isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com,
	 suzuki.poulose@arm.com, steven.price@arm.com,
	quic_mnalajal@quicinc.com,  quic_tsoni@quicinc.com,
	quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
	 quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
	catalin.marinas@arm.com,  james.morse@arm.com,
	yuzenghui@huawei.com, oliver.upton@linux.dev,  maz@kernel.org,
	keirf@google.com, linux-mm@kvack.org
Subject: Re: folio_mmapped
Date: Wed, 3 Apr 2024 17:15:19 -0700	[thread overview]
Message-ID: <Zg3xF7dTtx6hbmZj@google.com> (raw)
In-Reply-To: <20240327193454.GB11880@willie-the-truck>

On Wed, Mar 27, 2024, Will Deacon wrote:
> Hi again, David,
> 
> On Fri, Mar 22, 2024 at 06:52:14PM +0100, David Hildenbrand wrote:
> > On 19.03.24 15:31, Will Deacon wrote:
> > sorry for the late reply!
> 
> Bah, you and me both!

Hold my beer ;-)

> > > On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote:
> > > > On 19.03.24 01:10, Sean Christopherson wrote:
> > > > > On Mon, Mar 18, 2024, Vishal Annapurve wrote:
> > > > > > On Mon, Mar 18, 2024 at 3:02 PM David Hildenbrand <david@redhat.com> wrote:
> > >  From the pKVM side, we're working on guest_memfd primarily to avoid
> > > diverging from what other CoCo solutions end up using, but if it gets
> > > de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do
> > > today with anonymous memory, then it's a really hard sell to switch over
> > > from what we have in production. We're also hoping that, over time,
> > > guest_memfd will become more closely integrated with the mm subsystem to
> > > enable things like hypervisor-assisted page migration, which we would
> > > love to have.
> > 
> > Reading Sean's reply, he has a different view on that. And I think that's
> > the main issue: there are too many different use cases and too many
> > different requirements that could turn guest_memfd into something that maybe
> > it really shouldn't be.
> 
> No argument there, and we're certainly not tied to any specific
> mechanism on the pKVM side. Maybe Sean can chime in, but we've
> definitely spoken about migration being a goal in the past, so I guess
> something changed since then on the guest_memfd side.

What's "hypervisor-assisted page migration"?  More specifically, what's the
mechanism that drives it?

I am not opposed to page migration itself, what I am opposed to is adding deep
integration with core MM to do some of the fancy/complex things that lead to page
migration.

Another thing I want to avoid is taking a hard dependency on "struct page", so
that we can have line of sight to eliminating "struct page" overhead for guest_memfd,
but that's definitely a more distant future concern.

> > This makes sense: shared memory is neither nasty nor special. You can
> > migrate it, swap it out, map it into page tables, GUP it, ... without any
> > issues.
> 
> Slight aside and not wanting to derail the discussion, but we have a few
> different types of sharing which we'll have to consider:
> 
>   * Memory shared from the host to the guest. This remains owned by the
>     host and the normal mm stuff can be made to work with it.

This seems like it should be !guest_memfd, i.e. can't be converted to guest
private (without first unmapping it from the host, but at that point it's
completely different memory, for all intents and purposes).

>   * Memory shared from the guest to the host. This remains owned by the
>     guest, so there's a pin on the pages and the normal mm stuff can't
>     work without co-operation from the guest (see next point).

Do you happen to have a list of exactly what you mean by "normal mm stuff"?  I
am not at all opposed to supporting .mmap(), because long term I also want to
use guest_memfd for non-CoCo VMs.  But I want to be very conservative with respect
to what is allowed for guest_memfd.   E.g. host userspace can map guest_memfd,
and do operations that are directly related to its mapping, but that's about it.

>   * Memory relinquished from the guest to the host. This actually unmaps
>     the pages from the host and transfers ownership back to the host,
>     after which the pin is dropped and the normal mm stuff can work. We
>     use this to implement ballooning.
> 
> I suppose the main thing is that the architecture backend can deal with
> these states, so the core code shouldn't really care as long as it's
> aware that shared memory may be pinned.
> 
> > So if I would describe some key characteristics of guest_memfd as of today,
> > it would probably be:
> > 
> > 1) Memory is unmovable and unswappable. Right from the beginning, it is
> >    allocated as unmovable (e.g., not placed on ZONE_MOVABLE, CMA, ...).
> > 2) Memory is inaccessible. It cannot be read from user space, the
> >    kernel, it cannot be GUP'ed ... only some mechanisms might end up
> >    touching that memory (e.g., hibernation, /proc/kcore) might end up
> >    touching it "by accident", and we usually can handle these cases.
> > 3) Memory can be discarded in page granularity. There should be no cases
> >    where you cannot discard memory to over-allocate memory for private
> >    pages that have been replaced by shared pages otherwise.
> > 4) Page tables are not required (well, it's an memfd), and the fd could
> >    in theory be passed to other processes.o

More broadly, no VMAs are required.  The lack of stage-1 page tables are nice to
have; the lack of VMAs means that guest_memfd isn't playing second fiddle, e.g.
it's not subject to VMA protections, isn't restricted to host mapping size, etc.

> > Having "ordinary shared" memory in there implies that 1) and 2) will have to
> > be adjusted for them, which kind-of turns it "partially" into ordinary shmem
> > again.
> 
> Yes, and we'd also need a way to establish hugepages (where possible)
> even for the *private* memory so as to reduce the depth of the guest's
> stage-2 walk.

Yeah, hugepage support for guest_memfd is very much a WIP.  Getting _something_
is easy, getting the right thing is much harder.

  parent reply	other threads:[~2024-04-04  0:15 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-22 16:10 [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 01/26] KVM: Split KVM memory attributes into user and kernel attributes Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 02/26] KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 03/26] KVM: Add restricted support for mapping guestmem by the host Fuad Tabba
2024-02-22 16:28   ` David Hildenbrand
2024-02-26  8:58     ` Fuad Tabba
2024-02-26  9:57       ` David Hildenbrand
2024-02-26 17:30         ` Fuad Tabba
2024-02-27  7:40           ` David Hildenbrand
2024-02-22 16:10 ` [RFC PATCH v1 04/26] KVM: Don't allow private attribute to be set if mapped by host Fuad Tabba
2024-04-17 23:27   ` Sean Christopherson
2024-04-18 10:54   ` David Hildenbrand
2024-02-22 16:10 ` [RFC PATCH v1 05/26] KVM: Don't allow private attribute to be removed for unmappable memory Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 06/26] KVM: Implement kvm_(read|/write)_guest_page for private memory slots Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 07/26] KVM: arm64: Turn llist of pinned pages into an rb-tree Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 08/26] KVM: arm64: Implement MEM_RELINQUISH SMCCC hypercall Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 09/26] KVM: arm64: Strictly check page type in MEM_RELINQUISH hypercall Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 10/26] KVM: arm64: Avoid unnecessary unmap walk " Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 11/26] KVM: arm64: Add initial support for KVM_CAP_EXIT_HYPERCALL Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 12/26] KVM: arm64: Allow userspace to receive SHARE and UNSHARE notifications Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 13/26] KVM: arm64: Create hypercall return handler Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 14/26] KVM: arm64: Refactor code around handling return from host to guest Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 15/26] KVM: arm64: Rename kvm_pinned_page to kvm_guest_page Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 16/26] KVM: arm64: Add a field to indicate whether the guest page was pinned Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 17/26] KVM: arm64: Do not allow changes to private memory slots Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 18/26] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 19/26] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 20/26] KVM: arm64: Track sharing of memory from protected guest to host Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 21/26] KVM: arm64: Mark a protected VM's memory as unmappable at initialization Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 22/26] KVM: arm64: Handle unshare on way back to guest entry rather than exit Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 23/26] KVM: arm64: Check that host unmaps memory unshared by guest Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 24/26] KVM: arm64: Add handlers for kvm_arch_*_set_memory_attributes() Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 25/26] KVM: arm64: Enable private memory support when pKVM is enabled Fuad Tabba
2024-02-22 16:10 ` [RFC PATCH v1 26/26] KVM: arm64: Enable private memory kconfig for arm64 Fuad Tabba
2024-02-22 23:43 ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Elliot Berman
2024-02-23  0:35   ` folio_mmapped Matthew Wilcox
2024-02-26  9:28     ` folio_mmapped David Hildenbrand
2024-02-26 21:14       ` folio_mmapped Elliot Berman
2024-02-27 14:59         ` folio_mmapped David Hildenbrand
2024-02-28 10:48           ` folio_mmapped Quentin Perret
2024-02-28 11:11             ` folio_mmapped David Hildenbrand
2024-02-28 12:44               ` folio_mmapped Quentin Perret
2024-02-28 13:00                 ` folio_mmapped David Hildenbrand
2024-02-28 13:34                   ` folio_mmapped Quentin Perret
2024-02-28 18:43                     ` folio_mmapped Elliot Berman
2024-02-28 18:51                       ` Quentin Perret
2024-02-29 10:04                     ` folio_mmapped David Hildenbrand
2024-02-29 19:01                       ` folio_mmapped Fuad Tabba
2024-03-01  0:40                         ` folio_mmapped Elliot Berman
2024-03-01 11:16                           ` folio_mmapped David Hildenbrand
2024-03-04 12:53                             ` folio_mmapped Quentin Perret
2024-03-04 20:22                               ` folio_mmapped David Hildenbrand
2024-03-01 11:06                         ` folio_mmapped David Hildenbrand
2024-03-04 12:36                       ` folio_mmapped Quentin Perret
2024-03-04 19:04                         ` folio_mmapped Sean Christopherson
2024-03-04 20:17                           ` folio_mmapped David Hildenbrand
2024-03-04 21:43                             ` folio_mmapped Elliot Berman
2024-03-04 21:58                               ` folio_mmapped David Hildenbrand
2024-03-19  9:47                                 ` folio_mmapped Quentin Perret
2024-03-19  9:54                                   ` folio_mmapped David Hildenbrand
2024-03-18 17:06                             ` folio_mmapped Vishal Annapurve
2024-03-18 22:02                               ` folio_mmapped David Hildenbrand
2024-03-18 23:07                                 ` folio_mmapped Vishal Annapurve
2024-03-19  0:10                                   ` folio_mmapped Sean Christopherson
2024-03-19 10:26                                     ` folio_mmapped David Hildenbrand
2024-03-19 13:19                                       ` folio_mmapped David Hildenbrand
2024-03-19 14:31                                       ` folio_mmapped Will Deacon
2024-03-19 23:54                                         ` folio_mmapped Elliot Berman
2024-03-22 16:36                                           ` Will Deacon
2024-03-22 18:46                                             ` Elliot Berman
2024-03-27 19:31                                               ` Will Deacon
2024-03-22 17:52                                         ` folio_mmapped David Hildenbrand
2024-03-22 21:21                                           ` folio_mmapped David Hildenbrand
2024-03-26 22:04                                             ` folio_mmapped Elliot Berman
2024-03-27 17:50                                               ` folio_mmapped David Hildenbrand
2024-03-27 19:34                                           ` folio_mmapped Will Deacon
2024-03-28  9:06                                             ` folio_mmapped David Hildenbrand
2024-03-28 10:10                                               ` folio_mmapped Quentin Perret
2024-03-28 10:32                                                 ` folio_mmapped David Hildenbrand
2024-03-28 10:58                                                   ` folio_mmapped Quentin Perret
2024-03-28 11:41                                                     ` folio_mmapped David Hildenbrand
2024-03-29 18:38                                                       ` folio_mmapped Vishal Annapurve
2024-04-04  0:15                                             ` Sean Christopherson [this message]
2024-03-19 15:04                                       ` folio_mmapped Sean Christopherson
2024-03-22 17:16                                         ` folio_mmapped David Hildenbrand
2024-02-26  9:03   ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba
2024-02-23 12:00 ` Alexandru Elisei
2024-02-26  9:05   ` Fuad Tabba
2024-02-26  9:47 ` David Hildenbrand
2024-02-27  9:37   ` Fuad Tabba
2024-02-27 14:41     ` David Hildenbrand
2024-02-27 14:49       ` David Hildenbrand
2024-02-28  9:57       ` Fuad Tabba
2024-02-28 10:12         ` David Hildenbrand
2024-02-28 14:01           ` Quentin Perret
2024-02-29  9:51             ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zg3xF7dTtx6hbmZj@google.com \
    --to=seanjc@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).