From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>,
isaku.yamahata@intel.com, pbonzini@redhat.com,
erdemaktas@google.com, vkuznets@redhat.com, seanjc@google.com,
vannapurve@google.com, jmattson@google.com, mlevitsk@redhat.com,
chao.gao@intel.com, rick.p.edgecombe@intel.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH V5 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer
Date: Mon, 29 Apr 2024 13:28:02 +0800 [thread overview]
Message-ID: <a9e7004f-0d2d-4a32-b663-a57391a9cedd@intel.com> (raw)
In-Reply-To: <6748a4c12269e756f0c48680da8ccc5367c31ce7.1714081726.git.reinette.chatre@intel.com>
On 4/26/2024 6:07 AM, Reinette Chatre wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Add KVM_CAP_X86_APIC_BUS_CYCLES_NS capability to configure the APIC
> bus clock frequency for APIC timer emulation.
> Allow KVM_ENABLE_CAPABILITY(KVM_CAP_X86_APIC_BUS_CYCLES_NS) to set the
> frequency in nanoseconds. When using this capability, the user space
> VMM should configure CPUID leaf 0x15 to advertise the frequency.
>
> Vishal reported that the TDX guest kernel expects a 25MHz APIC bus
> frequency but ends up getting interrupts at a significantly higher rate.
>
> The TDX architecture hard-codes the core crystal clock frequency to
> 25MHz and mandates exposing it via CPUID leaf 0x15. The TDX architecture
> does not allow the VMM to override the value.
>
> In addition, per Intel SDM:
> "The APIC timer frequency will be the processor’s bus clock or core
> crystal clock frequency (when TSC/core crystal clock ratio is
> enumerated in CPUID leaf 0x15) divided by the value specified in
> the divide configuration register."
>
> The resulting 25MHz APIC bus frequency conflicts with the KVM hardcoded
> APIC bus frequency of 1GHz.
>
> The KVM doesn't enumerate CPUID leaf 0x15 to the guest unless the user
> space VMM sets it using KVM_SET_CPUID. If the CPUID leaf 0x15 is
> enumerated, the guest kernel uses it as the APIC bus frequency. If not,
> the guest kernel measures the frequency based on other known timers like
> the ACPI timer or the legacy PIT. As reported by Vishal the TDX guest
> kernel expects a 25MHz timer frequency but gets timer interrupt more
> frequently due to the 1GHz frequency used by KVM.
>
> To ensure that the guest doesn't have a conflicting view of the APIC bus
> frequency, allow the userspace to tell KVM to use the same frequency that
> TDX mandates instead of the default 1Ghz.
>
> There are several options to address this:
> 1. Make the KVM able to configure APIC bus frequency (this series).
> Pro: It resembles the existing hardware. The recent Intel CPUs
> adapts 25MHz.
> Con: Require the VMM to emulate the APIC timer at 25MHz.
> 2. Make the TDX architecture enumerate CPUID leaf 0x15 to configurable
> frequency or not enumerate it.
> Pro: Any APIC bus frequency is allowed.
> Con: Deviates from TDX architecture.
> 3. Make the TDX guest kernel use 1GHz when it's running on KVM.
> Con: The kernel ignores CPUID leaf 0x15.
> 4. Change CPUID leaf 0x15 under TDX to report the crystal clock frequency
> as 1 GHz.
> Pro: This has been the virtual APIC frequency for KVM guests for 13
> years.
> Pro: This requires changing only one hard-coded constant in TDX.
> Con: It doesn't work with other VMMs as TDX isn't specific to KVM.
> Con: Core crystal clock frequency is also used to calculate TSC
> frequency.
> Con: If it is configured to value different from hardware, it will
> break the correctness of INTEL-PT Mini Time Count (MTC) packets
> in TDs.
>
> Reported-by: Vishal Annapurve <vannapurve@google.com>
> Closes: https://lore.kernel.org/lkml/20231006011255.4163884-1-vannapurve@google.com/
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> Changes v5:
> - Rename capability KVM_CAP_X86_APIC_BUS_FREQUENCY ->
> KVM_CAP_X86_APIC_BUS_CYCLES_NS. (Xiaoyao Li)
> - Add Rick's Reviewed-by tag.
>
> Changes v4:
> - Rework implementation following Sean's guidance in:
> https://lore.kernel.org/all/ZdjzIgS6EAeCsUue@google.com/
> - Reword con #2 to acknowledge feedback. (Sean)
> - Add the "con" information from Xiaoyao during earlier review of v2.
> - Rework changelog to address comments related to "bus clock" vs
> "core crystal clock" frequency. (Xiaoyao)
> - Drop snippet about impact on TSC deadline timer emulation. (Maxim)
> - Drop Maxim Levitsky's "Reviewed-by" tag due to many changes to patch
> since tag received.
> - Switch "Subject:" to match custom "KVM: X86:" -> "KVM: x86:"
>
> Changes v3:
> - Added reviewed-by Maxim Levitsky.
> - Minor update of the commit message.
>
> Changes v2:
> - Add check if vcpu isn't created.
> - Add check if lapic chip is in-kernel emulation.
> - Fix build error for i386.
> - Add document to api.rst.
> - Typo in the commit message.
>
> Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
> arch/x86/kvm/x86.c | 27 +++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 45 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..f014cd9b2217 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8063,6 +8063,23 @@ error/annotated fault.
>
> See KVM_EXIT_MEMORY_FAULT for more information.
>
> +7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
> +-----------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
> +:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
> + frequency or if any vCPUs have been created, -ENXIO if a virtual
> + local APIC has not been created using KVM_CREATE_IRQCHIP.
> +
> +This capability sets VM's APIC bus clock frequency, used by KVM's in-kernel
> +virtual APIC when emulating APIC timers. KVM's default value can be retrieved
> +by KVM_CHECK_EXTENSION.
> +
> +Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
> +core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 10e6315103f4..fa6954c9a9d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4715,6 +4715,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_MEMORY_FAULT_INFO:
> r = 1;
> break;
> + case KVM_CAP_X86_APIC_BUS_CYCLES_NS:
> + r = APIC_BUS_CYCLE_NS_DEFAULT;
> + break;
> case KVM_CAP_EXIT_HYPERCALL:
> r = KVM_EXIT_HYPERCALL_VALID_MASK;
> break;
> @@ -6755,6 +6758,30 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_X86_APIC_BUS_CYCLES_NS: {
> + u64 bus_cycle_ns = cap->args[0];
> + u64 unused;
> +
> + r = -EINVAL;
> + /*
> + * Guard against overflow in tmict_to_ns(). 128 is the highest
> + * divide value that can be programmed in APIC_TDCR.
> + */
> + if (!bus_cycle_ns ||
> + check_mul_overflow((u64)U32_MAX * 128, bus_cycle_ns, &unused))
> + break;
> +
> + r = 0;
> + mutex_lock(&kvm->lock);
> + if (!irqchip_in_kernel(kvm))
> + r = -ENXIO;
> + else if (kvm->created_vcpus)
> + r = -EINVAL;
> + else
> + kvm->arch.apic_bus_cycle_ns = bus_cycle_ns;
> + mutex_unlock(&kvm->lock);
> + break;
> + }
> default:
> r = -EINVAL;
> break;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..6a4d9432ab11 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_MEMORY_ATTRIBUTES 233
> #define KVM_CAP_GUEST_MEMFD 234
> #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 236
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
next prev parent reply other threads:[~2024-04-29 5:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-25 22:06 [PATCH V5 0/4] KVM: x86: Make bus clock frequency for vAPIC timer configurable Reinette Chatre
2024-04-25 22:06 ` [PATCH V5 1/4] KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V Reinette Chatre
2024-04-25 22:07 ` [PATCH V5 2/4] KVM: x86: Make nsec per APIC bus cycle a VM variable Reinette Chatre
2024-04-29 4:23 ` Xiaoyao Li
2024-04-25 22:07 ` [PATCH V5 3/4] KVM: x86: Add a capability to configure bus frequency for APIC timer Reinette Chatre
2024-04-28 6:24 ` Yuan Yao
2024-04-29 5:28 ` Xiaoyao Li [this message]
2024-04-25 22:07 ` [PATCH V5 4/4] KVM: selftests: Add test for configure of x86 APIC bus frequency Reinette Chatre
2024-04-26 23:06 ` Chen, Zide
2024-04-26 23:26 ` Reinette Chatre
2024-04-27 5:38 ` Chen, Zide
2024-04-29 15:23 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a9e7004f-0d2d-4a32-b663-a57391a9cedd@intel.com \
--to=xiaoyao.li@intel.com \
--cc=chao.gao@intel.com \
--cc=erdemaktas@google.com \
--cc=isaku.yamahata@intel.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=vannapurve@google.com \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).