LKML Archive mirror
 help / color / mirror / Atom feed
From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>,
	isaku.yamahata@intel.com, pbonzini@redhat.com,
	erdemaktas@google.com, vkuznets@redhat.com, seanjc@google.com,
	vannapurve@google.com, jmattson@google.com, mlevitsk@redhat.com,
	chao.gao@intel.com, rick.p.edgecombe@intel.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V5 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer
Date: Mon, 29 Apr 2024 13:28:02 +0800	[thread overview]
Message-ID: <a9e7004f-0d2d-4a32-b663-a57391a9cedd@intel.com> (raw)
In-Reply-To: <6748a4c12269e756f0c48680da8ccc5367c31ce7.1714081726.git.reinette.chatre@intel.com>

On 4/26/2024 6:07 AM, Reinette Chatre wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
> bus clock frequency for APIC timer emulation.
> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
> frequency in nanoseconds. When using this capability, the user space
> VMM should configure CPUID leaf 0x15 to advertise the frequency.
> 
> Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
> frequency but ends up getting interrupts at a significantly higher rate.
> 
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
> 
> In addition, per Intel SDM:
>      "The APIC timer frequency will be the processor’s bus clock or core
>       crystal clock frequency (when TSC/core crystal clock ratio is
>       enumerated in CPUID leaf 0x15) divided by the value specified in
>       the divide configuration register."
> 
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
> 
> The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
> space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
> enumerated, the guest kernel uses it as the APIC bus frequency. If not,
> the guest kernel measures the frequency based on other known timers like
> the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
> kernel expects a 25MHz timer frequency but gets timer interrupt more
> frequently due to the 1GHz frequency used by KVM.
> 
> To ensure that the guest doesn't have a conflicting view of the APIC bus
> frequency, allow the userspace to tell KVM to use the same frequency that
> TDX mandates instead of the default 1Ghz.
> 
> There are several options to address this:
> 1. Make the KVM able to configure APIC bus frequency (this series).
>     Pro: It resembles the existing hardware.  The recent Intel CPUs
>          adapts 25MHz.
>     Con: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
>     frequency or not enumerate it.
>     Pro: Any APIC bus frequency is allowed.
>     Con: Deviates from TDX architecture.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
>     Con: The kernel ignores CPUID leaf 0x15.
> 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
>     as 1 GHz.
>     Pro: This has been the virtual APIC frequency for KVM guests for 13
>          years.
>     Pro: This requires changing only one hard-coded constant in TDX.
>     Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
>     Con: Core crystal clock frequency is also used to calculate TSC
>          frequency.
>     Con: If it is configured to value different from hardware, it will
>          break the correctness of INTEL-PT Mini Time Count (MTC) packets
>          in TDs.
> 
> Reported-by: Vishal Annapurve <vannapurve@google.com>
> Closes: https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
> Changes v5:
> - Rename capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
>    KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
> - Add Rick's Reviewed-by tag.
> 
> Changes v4:
> - Rework implementation following Sean's guidance in:
>    https://lore.kernel.org/all/ZdjzIgS6EAeCsUue@google.com/
> - Reword con #2 to acknowledge feedback. (Sean)
> - Add the "con" information from Xiaoyao during earlier review of v2.
> - Rework changelog to address comments related to "bus clock" vs
>    "core crystal clock" frequency. (Xiaoyao)
> - Drop snippet about impact on TSC deadline timer emulation. (Maxim)
> - Drop Maxim Levitsky's "Reviewed-by" tag due to many changes to patch
>    since tag received.
> - Switch "Subject:" to match custom "KVM: X86:" -> "KVM: x86:"
> 
> Changes v3:
> - Added reviewed-by Maxim Levitsky.
> - Minor update of the commit message.
> 
> Changes v2:
> - Add check if vcpu isn't created.
> - Add check if lapic chip is in-kernel emulation.
> - Fix build error for i386.
> - Add document to api.rst.
> - Typo in the commit message.
> 
>   Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
>   arch/x86/kvm/x86.c             | 27 +++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h       |  1 +
>   3 files changed, 45 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..f014cd9b2217 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8063,6 +8063,23 @@ error/annotated fault.
>   
>   See KVM_EXIT_MEMORY_FAULT for more information.
>   
> +7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
> +-----------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
> +:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
> +          frequency or if any vCPUs have been created, -ENXIO if a virtual
> +          local APIC has not been created using KVM_CREATE_IRQCHIP.
> +
> +This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
> +virtual APIC when emulating APIC timers.  KVM's default value can be retrieved
> +by KVM_CHECK_EXTENSION.
> +
> +Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
> +core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
> +
>   8. Other capabilities.
>   ======================
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 10e6315103f4..fa6954c9a9d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4715,6 +4715,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_MEMORY_FAULT_INFO:
>   		r = 1;
>   		break;
> +	case KVM_CAP_X86_APIC_BUS_CYCLES_NS:
> +		r = APIC_BUS_CYCLE_NS_DEFAULT;
> +		break;
>   	case KVM_CAP_EXIT_HYPERCALL:
>   		r = KVM_EXIT_HYPERCALL_VALID_MASK;
>   		break;
> @@ -6755,6 +6758,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		}
>   		mutex_unlock(&kvm->lock);
>   		break;
> +	case KVM_CAP_X86_APIC_BUS_CYCLES_NS: {
> +		u64 bus_cycle_ns = cap->args[0];
> +		u64 unused;
> +
> +		r = -EINVAL;
> +		/*
> +		 * Guard against overflow in tmict_to_ns(). 128 is the highest
> +		 * divide value that can be programmed in APIC_TDCR.
> +		 */
> +		if (!bus_cycle_ns ||
> +		    check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
> +			break;
> +
> +		r = 0;
> +		mutex_lock(&kvm->lock);
> +		if (!irqchip_in_kernel(kvm))
> +			r = -ENXIO;
> +		else if (kvm->created_vcpus)
> +			r = -EINVAL;
> +		else
> +			kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> +		mutex_unlock(&kvm->lock);
> +		break;
> +	}
>   	default:
>   		r = -EINVAL;
>   		break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..6a4d9432ab11 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
>   #define KVM_CAP_MEMORY_ATTRIBUTES 233
>   #define KVM_CAP_GUEST_MEMFD 234
>   #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 236
>   
>   struct kvm_irq_routing_irqchip {
>   	__u32 irqchip;


  parent reply	other threads:[~2024-04-29  5:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25 22:06 [PATCH V5 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable Reinette Chatre
2024-04-25 22:06 ` [PATCH V5 1/4] KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V Reinette Chatre
2024-04-25 22:07 ` [PATCH V5 2/4] KVM: x86: Make nsec per APIC bus cycle a VM variable Reinette Chatre
2024-04-29  4:23   ` Xiaoyao Li
2024-04-25 22:07 ` [PATCH V5 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer Reinette Chatre
2024-04-28  6:24   ` Yuan Yao
2024-04-29  5:28   ` Xiaoyao Li [this message]
2024-04-25 22:07 ` [PATCH V5 4/4] KVM: selftests: Add test for configure of x86 APIC bus frequency Reinette Chatre
2024-04-26 23:06   ` Chen, Zide
2024-04-26 23:26     ` Reinette Chatre
2024-04-27  5:38       ` Chen, Zide
2024-04-29 15:23         ` Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a9e7004f-0d2d-4a32-b663-a57391a9cedd@intel.com \
    --to=xiaoyao.li@intel.com \
    --cc=chao.gao@intel.com \
    --cc=erdemaktas@google.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=vannapurve@google.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).