Re: [PATCH 06/11] KVM: guest_memfd: Add hook for initializing memory

LKML Archive mirror
 help / color / mirror / Atom feed

From: Xu Yilun <yilun.xu@linux.intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	seanjc@google.com, michael.roth@amd.com,
	isaku.yamahata@intel.com
Subject: Re: [PATCH 06/11] KVM: guest_memfd: Add hook for initializing memory
Date: Mon, 22 Apr 2024 18:53:02 +0800	[thread overview]
Message-ID: <ZiZBjtQvUuuqqKNF@yilunxu-OptiPlex-7050> (raw)
In-Reply-To: <20240404185034.3184582-7-pbonzini@redhat.com>

On Thu, Apr 04, 2024 at 02:50:28PM -0400, Paolo Bonzini wrote:
> guest_memfd pages are generally expected to be in some arch-defined
> initial state prior to using them for guest memory. For SEV-SNP this
> initial state is 'private', or 'guest-owned', and requires additional
> operations to move these pages into a 'private' state by updating the
> corresponding entries the RMP table.
> 
> Allow for an arch-defined hook to handle updates of this sort, and go
> ahead and implement one for x86 so KVM implementations like AMD SVM can
> register a kvm_x86_ops callback to handle these updates for SEV-SNP
> guests.
> 
> The preparation callback is always called when allocating/grabbing
> folios via gmem, and it is up to the architecture to keep track of
> whether or not the pages are already in the expected state (e.g. the RMP
> table in the case of SEV-SNP).
> 
> In some cases, it is necessary to defer the preparation of the pages to
> handle things like in-place encryption of initial guest memory payloads
> before marking these pages as 'private'/'guest-owned'.  Add an argument
> (always true for now) to kvm_gmem_get_folio() that allows for the
> preparation callback to be bypassed.  To detect possible issues in

IIUC, we have 2 dedicated flows.
1 kvm_gmem_get_pfn() or kvm_gmem_allocate()
  a. kvm_gmem_get_folio()
  b. gmem_prepare() for RMP

2 in-place encryption or whatever
  a. kvm_gmem_get_folio(FGP_CREAT_ONLY)
  b. in-place encryption
  c. gmem_prepare() for RMP

Could we move gmem_prepare() out of kvm_gmem_get_folio(), then we could
have straightforward flow for each case, and don't have to have an
argument to pospone gmem_prepare().

> the way userspace initializes memory, it is only possible to add an
> unprepared page if it is not already included in the filemap.
> 
> Link: https://lore.kernel.org/lkml/ZLqVdvsF11Ddo7Dq@google.com/
> Co-developed-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> Message-Id: <20231230172351.574091-5-michael.roth@amd.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/include/asm/kvm-x86-ops.h |  1 +
>  arch/x86/include/asm/kvm_host.h    |  1 +
>  arch/x86/kvm/x86.c                 |  6 +++
>  include/linux/kvm_host.h           |  5 +++
>  virt/kvm/Kconfig                   |  4 ++
>  virt/kvm/guest_memfd.c             | 65 ++++++++++++++++++++++++++++--
>  6 files changed, 78 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 5187fcf4b610..d26fcad13e36 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -139,6 +139,7 @@ KVM_X86_OP(vcpu_deliver_sipi_vector)
>  KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons);
>  KVM_X86_OP_OPTIONAL(get_untagged_addr)
>  KVM_X86_OP_OPTIONAL(alloc_apic_backing_page)
> +KVM_X86_OP_OPTIONAL_RET0(gmem_prepare)
>  
>  #undef KVM_X86_OP
>  #undef KVM_X86_OP_OPTIONAL
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 01c69840647e..f101fab0040e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1809,6 +1809,7 @@ struct kvm_x86_ops {
>  
>  	gva_t (*get_untagged_addr)(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags);
>  	void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
> +	int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
>  };
>  
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2d2619d3eee4..972524ddcfdb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13598,6 +13598,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
>  
> +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
> +int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order)
> +{
> +	return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order);
> +}
> +#endif
>  
>  int kvm_spec_ctrl_test_value(u64 value)
>  {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 48f31dcd318a..33ed3b884a6b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2445,4 +2445,9 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>  }
>  #endif /* CONFIG_KVM_PRIVATE_MEM */
>  
> +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
> +int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order);
> +bool kvm_arch_gmem_prepare_needed(struct kvm *kvm);
> +#endif
> +
>  #endif
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 29b73eedfe74..ca870157b2ed 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -109,3 +109,7 @@ config KVM_GENERIC_PRIVATE_MEM
>         select KVM_GENERIC_MEMORY_ATTRIBUTES
>         select KVM_PRIVATE_MEM
>         bool
> +
> +config HAVE_KVM_GMEM_PREPARE
> +       bool
> +       depends on KVM_PRIVATE_MEM
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index e5b3cd02b651..486748e65f36 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -13,12 +13,60 @@ struct kvm_gmem {
>  	struct list_head entry;
>  };
>  
> -static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
> +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
> +bool __weak kvm_arch_gmem_prepare_needed(struct kvm *kvm)
> +{
> +	return false;
> +}
> +#endif

In which case HAVE_KVM_GMEM_PREPARE is selected but
gmem_prepare_needed() is never implemented?  Then all gmem_prepare stuff
are actually dead code.  Maybe we don't need this weak stub?

> +
> +static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio)
> +{
> +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE
> +	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
> +	struct kvm_gmem *gmem;
> +
> +	list_for_each_entry(gmem, gmem_list, entry) {
> +		struct kvm_memory_slot *slot;
> +		struct kvm *kvm = gmem->kvm;
> +		struct page *page;
> +		kvm_pfn_t pfn;
> +		gfn_t gfn;
> +		int rc;
> +
> +		if (!kvm_arch_gmem_prepare_needed(kvm))
> +			continue;
> +
> +		slot = xa_load(&gmem->bindings, index);
> +		if (!slot)
> +			continue;
> +
> +		page = folio_file_page(folio, index);
> +		pfn = page_to_pfn(page);
> +		gfn = slot->base_gfn + index - slot->gmem.pgoff;
> +		rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page)));
> +		if (rc) {
> +			pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n",
> +					    index, rc);
> +			return rc;
> +		}
> +	}
> +
> +#endif
> +	return 0;
> +}
> +
> +static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare)
>  {
>  	struct folio *folio;
> +	fgf_t fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT;
> +
> +	if (!prepare)
> +		fgp_flags |= FGP_CREAT_ONLY;
>  
>  	/* TODO: Support huge pages. */
> -	folio = filemap_grab_folio(inode->i_mapping, index);
> +	folio = __filemap_get_folio(inode->i_mapping, index, fgp_flags,
> +				    mapping_gfp_mask(inode->i_mapping));
>  	if (IS_ERR_OR_NULL(folio))
>  		return folio;
>  
> @@ -41,6 +89,15 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index)
>  		folio_mark_uptodate(folio);
>  	}
>  
> +	if (prepare) {
> +		int r =	kvm_gmem_prepare_folio(inode, index, folio);
> +		if (r < 0) {
> +			folio_unlock(folio);
> +			folio_put(folio);
> +			return ERR_PTR(r);
> +		}
> +	}
> +

Do we still need to prepare the page if it is hwpoisoned? I see the
hwpoisoned check is outside, in kvm_gmem_get_pfn().

Thanks,
Yilun

>  	/*
>  	 * Ignore accessed, referenced, and dirty flags.  The memory is
>  	 * unevictable and there is no storage to write back to.
> @@ -145,7 +202,7 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len)
>  			break;
>  		}
>  
> -		folio = kvm_gmem_get_folio(inode, index);
> +		folio = kvm_gmem_get_folio(inode, index, true);
>  		if (IS_ERR_OR_NULL(folio)) {
>  			r = folio ? PTR_ERR(folio) : -ENOMEM;
>  			break;
> @@ -505,7 +562,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>  		goto out_fput;
>  	}
>  
> -	folio = kvm_gmem_get_folio(file_inode(file), index);
> +	folio = kvm_gmem_get_folio(file_inode(file), index, true);
>  	if (!folio) {
>  		r = -ENOMEM;
>  		goto out_fput;
> -- 
> 2.43.0
> 
> 
>

next prev parent reply	other threads:[~2024-04-22 10:58 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-04 18:50 [PATCH 00/11] KVM: guest_memfd: New hooks and functionality for SEV-SNP and TDX Paolo Bonzini
2024-04-04 18:50 ` [PATCH 01/11] mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory Paolo Bonzini
2024-04-29 13:14   ` Vlastimil Babka
2024-04-04 18:50 ` [PATCH 02/11] KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode Paolo Bonzini
2024-04-29 13:15   ` Vlastimil Babka
2024-04-04 18:50 ` [PATCH 03/11] KVM: guest_memfd: pass error up from filemap_grab_folio Paolo Bonzini
2024-04-04 18:50 ` [PATCH 04/11] filemap: add FGP_CREAT_ONLY Paolo Bonzini
2024-04-25  5:52   ` Paolo Bonzini
2024-04-29 13:26     ` Vlastimil Babka
2024-04-04 18:50 ` [PATCH 05/11] KVM: guest_memfd: limit overzealous WARN Paolo Bonzini
2024-04-04 18:50 ` [PATCH 06/11] KVM: guest_memfd: Add hook for initializing memory Paolo Bonzini
2024-04-22 10:53   ` Xu Yilun [this message]
2024-05-07 16:17     ` Paolo Bonzini
2024-04-04 18:50 ` [PATCH 07/11] KVM: guest_memfd: extract __kvm_gmem_get_pfn() Paolo Bonzini
2024-04-09 23:35   ` Michael Roth
2024-04-24 22:34   ` Sean Christopherson
2024-04-24 22:59     ` Sean Christopherson
2024-04-04 18:50 ` [PATCH 08/11] KVM: guest_memfd: extract __kvm_gmem_punch_hole() Paolo Bonzini
2024-04-04 18:50 ` [PATCH 09/11] KVM: guest_memfd: Add interface for populating gmem pages with user data Paolo Bonzini
2024-04-22 14:44   ` Xu Yilun
2024-04-23 23:50   ` Isaku Yamahata
2024-04-24 22:24     ` Sean Christopherson
2024-04-25  1:12       ` Isaku Yamahata
2024-04-25  6:01         ` Paolo Bonzini
2024-04-25 16:00           ` Sean Christopherson
2024-04-25 16:51             ` Isaku Yamahata
2024-04-26  5:44               ` Paolo Bonzini
2024-04-26 17:15                 ` Isaku Yamahata
2024-04-26  5:41             ` Paolo Bonzini
2024-04-26 15:17               ` Sean Christopherson
2024-04-24 22:32   ` Sean Christopherson
2024-04-25  5:56     ` Paolo Bonzini
2024-04-04 18:50 ` [PATCH 10/11] KVM: guest_memfd: Add hook for invalidating memory Paolo Bonzini
2024-04-04 18:50 ` [PATCH 11/11] KVM: x86: Add gmem hook for determining max NPT mapping level Paolo Bonzini
2024-04-09 23:46   ` Michael Roth
2024-04-19 18:26     ` Isaku Yamahata
2024-04-22 14:52   ` Xu Yilun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiZBjtQvUuuqqKNF@yilunxu-OptiPlex-7050 \
    --to=yilun.xu@linux.intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).