LKML Archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Paul Durrant <paul@xen.org>, Shuah Khan <shuah@kernel.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	jalliste@amazon.co.uk, sveith@amazon.de, zide.chen@intel.com,
	Dongli Zhang <dongli.zhang@oracle.com>
Subject: [PATCH v2 01/15] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
Date: Sat, 27 Apr 2024 12:04:58 +0100	[thread overview]
Message-ID: <20240427111929.9600-2-dwmw2@infradead.org> (raw)
In-Reply-To: <20240427111929.9600-1-dwmw2@infradead.org>

From: David Woodhouse <dwmw@amazon.co.uk>

The KVM clock is an interesting thing. It is defined as "nanoseconds
since the guest was created", but in practice it runs at two *different*
rates — or three different rates, if you count implementation bugs.

Definition A is that it runs synchronously with the CLOCK_MONOTONIC_RAW
of the host, with a delta of kvm->arch.kvmclock_offset.

But that version doesn't actually get used in the common case, where the
host has a reliable TSC and the guest TSCs are all running at the same
rate and in sync with each other, and kvm->arch.use_master_clock is set.

In that common case, definition B is used: There is a reference point in
time at kvm->arch.master_kernel_ns (again a CLOCK_MONOTONIC_RAW time),
and a corresponding host TSC value kvm->arch.master_cycle_now. This
fixed point in time is converted to guest units (the time offset by
kvmclock_offset and the TSC Value scaled and offset to be a guest TSC
value) and advertised to the guest in the pvclock structure. While in
this 'use_master_clock' mode, the fixed point in time never needs to be
changed, and the clock runs precisely in time with the guest TSC, at the
rate advertised in the pvclock structure.

The third definition C is implemented in kvm_get_wall_clock_epoch() and
__get_kvmclock(), using the master_cycle_now and master_kernel_ns fields
but converting the *host* TSC cycles directly to a value in nanoseconds
instead of scaling via the guest TSC.

One might naïvely think that all three definitions are identical, since
CLOCK_MONOTONIC_RAW is not skewed by NTP frequency corrections; all
three are just the result of counting the host TSC at a known frequency,
or the scaled guest TSC at a known precise fraction of the host's
frequency. The problem is with arithmetic precision, and the way that
frequency scaling is done in a division-free way by multiplying by a
scale factor, then shifting right. In practice, all three ways of
calculating the KVM clock will suffer a systemic drift from each other.

Eventually, definition C should just be eliminated. Commit 451a707813ae
("KVM: x86/xen: improve accuracy of Xen timers") worked around it for
the specific case of Xen timers, which are defined in terms of the KVM
clock and suffered from a continually increasing error in timer expiry
times. That commit notes that get_kvmclock_ns() is non-trivial to fix
and says "I'll come back to that", which remains true.

Definitions A and B do need to coexist, the former to handle the case
where the host or guest TSC is suboptimally configured. But KVM should
be more careful about switching between them, and the discontinuity in
guest time which could result.

In particular, KVM_REQ_MASTERCLOCK_UPDATE will take a new snapshot of
time as the reference in master_kernel_ns and master_cycle_now, yanking
the guest's clock back to match definition A at that moment.

When invoked from in 'use_master_clock' mode, kvm_update_masterclock()
should probably *adjust* kvm->arch.kvmclock_offset to account for the
drift, instead of yanking the clock back to defintion A. But in the
meantime there are a bunch of places where it just doesn't need to be
invoked at all.

To start with: there is no need to do such an update when a Xen guest
populates the shared_info page. This seems to have been a hangover from
the very first implementation of shared_info which automatically
populated the vcpu_info structures at their default locations, but even
then it should just have raised KVM_REQ_CLOCK_UPDATE on each vCPU
instead of using KVM_REQ_MASTERCLOCK_UPDATE. And now that userspace is
expected to explicitly set the vcpu_info even in its default locations,
there's not even any need for that either.

Fixes: 629b5348841a1 ("KVM: x86/xen: update wallclock region")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
---
 arch/x86/kvm/xen.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index f65b35a05d91..5a83a8154b79 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -98,8 +98,6 @@ static int kvm_xen_shared_info_init(struct kvm *kvm)
 	wc->version = wc_version + 1;
 	read_unlock_irq(&gpc->lock);
 
-	kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE);
-
 out:
 	srcu_read_unlock(&kvm->srcu, idx);
 	return ret;
-- 
2.44.0


  reply	other threads:[~2024-04-27 11:21 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-27 11:04 [RFC PATCH v2] Cleaning up the KVM clock mess David Woodhouse
2024-04-27 11:04 ` David Woodhouse [this message]
2024-05-04  7:42   ` [PATCH v2 01/15] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() Marcelo Tosatti
2024-05-07 19:08     ` David Woodhouse
2024-04-27 11:04 ` [PATCH v2 02/15] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force David Woodhouse
2024-04-27 11:05 ` [PATCH v2 03/15] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration David Woodhouse
2024-04-27 11:05 ` [PATCH v2 04/15] KVM: selftests: Add KVM/PV clock selftest to prove timer correction David Woodhouse
2024-04-27 11:05 ` [PATCH v2 05/15] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC David Woodhouse
2024-04-27 11:05 ` [PATCH v2 06/15] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration David Woodhouse
2024-04-27 11:05 ` [PATCH v2 07/15] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host David Woodhouse
2024-04-27 11:05 ` [PATCH v2 08/15] KVM: x86: Fix KVM clock precision in __get_kvmclock() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 09/15] KVM: x86: Fix software TSC upscaling in kvm_update_guest_time() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 10/15] KVM: x86: Simplify and comment kvm_get_time_scale() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 11/15] KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 12/15] KVM: x86: Improve synchronization in kvm_synchronize_tsc() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 13/15] KVM: x86: Kill cur_tsc_{nsec,offset,write} fields David Woodhouse
2024-05-10  9:03   ` Chenyi Qiang
2024-05-14 13:17     ` David Woodhouse
2024-04-27 11:05 ` [PATCH v2 14/15] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other David Woodhouse
2024-04-27 11:05 ` [PATCH v2 15/15] KVM: x86: Factor out kvm_use_master_clock() David Woodhouse
2024-05-01 17:55   ` Chen, Zide
2024-05-01 20:45     ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240427111929.9600-2-dwmw2@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongli.zhang@oracle.com \
    --cc=hpa@zytor.com \
    --cc=jalliste@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=sveith@amazon.de \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).