Linux-Doc Archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	 Paul Durrant <paul@xen.org>, Shuah Khan <shuah@kernel.org>,
	linux-doc@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	 jalliste@amazon.co.uk, sveith@amazon.de
Subject: Re: [PATCH 02/10] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force
Date: Mon, 22 Apr 2024 17:44:31 +0100	[thread overview]
Message-ID: <0bb540291542fd37b2d0f6d39ebf1c9f35fa1ac3.camel@infradead.org> (raw)
In-Reply-To: <CABgObfbK0aqqmAz7Z2efX4hNf7WRBYpoJ1a07oKMZdFXS2r0+g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4941 bytes --]

On Mon, 2024-04-22 at 17:54 +0200, Paolo Bonzini wrote:
> On Mon, Apr 22, 2024 at 5:39 PM David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > > ... especially considering that you did use a 64-bit integer here
> > > (though---please use u64 not uint64_t; and BTW if you want to add a
> > > patch to change kvm_get_time_scale() to u64, please do.
> > 
> > Meh, I'm used to programming in C. Yes, I *am* old enough to have been
> > doing this since the last decade of the 1900s, but it *has* been a long
> > time since 1999, and my fingers have learned :)
> 
> Oh, I am on the same page (working on both QEMU and Linux, adapting my
> muscle memory to the context sucks) but u64/s64 is the preferred
> spelling and I have been asked to use them before.

Ever heard of Jury Nullification... ? :)

> > Heh, looks like it was you who made it uint64_t, in 2016. In a commit
> > (3ae13faac) which said "Prepare for improving the precision in the next
> > patch"... which never came, AFAICT?
> 
> Yes, it was posted as
> https://lore.kernel.org/lkml/1454944711-33022-5-git-send-email-pbonzini@redhat.com/
> but not committed.

Ah, thanks. So that isn't about arithmetic precision, but about dealing
with the mess we had where the KVM clock was afflicted by NTP skew.

We live in a much saner world now where it's simply based on the guest
TSC (or, in pathological cases, the host's CLOCK_MONOTONIC_RAW. 

> As an aside, we discovered later that the patch you list as "Fixes"
> fixed another tricky bug: before, kvmclock could jump if the TSC is
> set within the 250 ppm tolerance that does not activate TSC scaling.
> This is possible after a first live migration, and then the second
> live migration used the guest TSC frequency *that userspace desired*
> instead of the *actual* TSC frequency.

Given our saner world where the KVM clock now *isn't* adjusted for NTP
skew, that "bug" was probably better left unfixed. In fact, I may give
some thought to reverting commit 78db6a503796 completely.

Perhaps we should call that "definition E". I think we're up to five
now? But no, let's not add historical ones to the taxonomy :)

I believe Jack's KVM_SET_CLOCK_GUEST fixes the fundamental issue there
of the clock jumping on migration. It's just a special case of the
breakage in KVM_REQ_MASTERCLOCK_UPDATE, where the KVM clock has happily
been running as a direct function of the guest TSC, and when we yank it
back to some other definition when we create a new masterclock
reference point.

With KVM_SET_CLOCK_GUEST, the KVM clock is restored as a function of
the guest TSC, rather than based on realtime/CLOCK_MONOTONIC_RAW/etc.

So even though a new masterclock reference is taken in the new kernel
(and the scaling factors in the pvclock may be slightly different, as
we discussed in the comment you responded to), the ka->kvmclock_offset
is adjusted so that the value of the KVM clock at *that* moment when
the new reference is taken, is precisely what it would have been under
the old kvmclock regime for the contemporary guest TSC value.




> Before:
> 
>         this_tsc_khz = __this_cpu_read(cpu_tsc_khz);
>         if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
>                 tgt_tsc_khz = vcpu->virtual_tsc_khz;
>                 kvm_get_time_scale(NSEC_PER_SEC / 1000, tgt_tsc_khz,
>                         &vcpu->hv_clock.tsc_shift,
>                         &vcpu->hv_clock.tsc_to_system_mul);
>                 vcpu->hw_tsc_khz = this_tsc_khz;
> 
> After:
> 
>         tgt_tsc_khz = __this_cpu_read(cpu_tsc_khz);
> 
>         // tgt_tsc_khz unchanged because TSC scaling was not enabled
>         tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz);
> 
>         if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) {
>                 kvm_get_time_scale(NSEC_PER_SEC / 1000, tgt_tsc_khz,
>                         &vcpu->hv_clock.tsc_shift,
>                         &vcpu->hv_clock.tsc_to_system_mul);
>                 vcpu->hw_tsc_khz = tgt_tsc_khz;
> 
> So in the first case kvm_get_time_scale uses vcpu->virtual_tsc_khz, in
> the second case it uses __this_cpu_read(cpu_tsc_khz).
> 
> This then caused a mismatch between the actual guest frequency and
> what is used by kvm_guest_time_update, which only becomes visible when
> migration resets the clock with KVM_GET/SET_CLOCK. KVM_GET_CLOCK
> returns what _should have been_ the same value read by the guest, but
> it's wrong.

Hm? Until I fixed it in this series, KVM_GET_CLOCK didn't even *try*
scaling via the guest's TSC frequency, did it? It just converted from
the *host* TSC to nanoseconds (since master_kernel_now) directly. Which
was even *more* broken :)


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

  reply	other threads:[~2024-04-22 16:44 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-18 19:34 [RFC PATCH 0/10] Cleaning up the KVM clock mess David Woodhouse
2024-04-18 19:34 ` [PATCH 01/10] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() David Woodhouse
2024-04-18 19:34 ` [PATCH 02/10] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force David Woodhouse
2024-04-19 15:29   ` Paul Durrant
2024-04-22 12:22   ` Paolo Bonzini
2024-04-22 15:39     ` David Woodhouse
2024-04-22 15:54       ` Paolo Bonzini
2024-04-22 16:44         ` David Woodhouse [this message]
2024-04-18 19:34 ` [PATCH 03/10] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration David Woodhouse
2024-04-19 15:40   ` Paul Durrant
2024-04-19 15:49     ` David Woodhouse
2024-04-22 14:11   ` Paolo Bonzini
2024-04-22 15:02     ` David Woodhouse
2024-04-18 19:34 ` [PATCH 04/10] KVM: selftests: Add KVM/PV clock selftest to prove timer correction David Woodhouse
2024-04-19 15:44   ` Paul Durrant
2024-04-18 19:34 ` [PATCH 05/10] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC David Woodhouse
2024-04-19 15:45   ` Paul Durrant
2024-04-18 19:34 ` [PATCH 06/10] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration David Woodhouse
2024-04-19 15:49   ` Paul Durrant
2024-04-19 15:53     ` David Woodhouse
2024-04-18 19:34 ` [PATCH 07/10] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host David Woodhouse
2024-04-19 15:53   ` Paul Durrant
2024-04-18 19:34 ` [PATCH 08/10] KVM: x86: Remove periodic global clock updates David Woodhouse
2024-04-19 15:55   ` Paul Durrant
2024-04-18 19:34 ` [PATCH 09/10] KVM: x86: Kill KVM_REQ_GLOBAL_CLOCK_UPDATE David Woodhouse
2024-04-19 15:57   ` Paul Durrant
2024-04-18 19:34 ` [PATCH 10/10] KVM: x86: Fix KVM clock precision in __get_kvmclock() David Woodhouse
2024-04-19 16:07   ` Paul Durrant
2024-04-19 12:51 ` [PATCH 11/10] KVM: x86: Fix software TSC upscaling in kvm_update_guest_time() David Woodhouse
2024-04-19 12:52 ` [RFC PATCH 0/10] Cleaning up the KVM clock mess David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0bb540291542fd37b2d0f6d39ebf1c9f35fa1ac3.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jalliste@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=sveith@amazon.de \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).