All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Hilber <peter.hilber@opensynergy.com>
To: David Woodhouse <dwmw2@infradead.org>,
	linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, linux-rtc@vger.kernel.org,
	"Ridoux, Julien" <ridouxj@amazon.com>,
	virtio-dev@lists.linux.dev
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>,
	Jason Wang <jasowang@redhat.com>,
	John Stultz <jstultz@google.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	netdev@vger.kernel.org,
	Richard Cochran <richardcochran@gmail.com>,
	Stephen Boyd <sboyd@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Marc Zyngier <maz@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Alessandro Zummo <a.zummo@towertech.it>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>
Subject: Re: [RFC PATCH v3 0/7] Add virtio_rtc module and related changes
Date: Thu, 20 Jun 2024 14:37:45 +0200	[thread overview]
Message-ID: <671a784b-234f-4be6-80bf-5135e257ed40@opensynergy.com> (raw)
In-Reply-To: <684eac07834699889fdb67be4cee09319c994a42.camel@infradead.org>

Changing virtio-dev address to the new one. The discussion might also be
relevant for virtio-comment, but it is discouraged to forward it to both.

On 15.06.24 10:40, David Woodhouse wrote:
> On Mon, 2023-12-18 at 08:38 +0100, Peter Hilber wrote:
>> RFC v3 updates
>> --------------
>>
>> This series implements a driver for a virtio-rtc device conforming to spec
>> RFC v3 [1]. It now includes an RTC class driver with alarm, in addition to
>> the PTP clock driver already present before.
>>
>> This patch series depends on the patch series "treewide: Use clocksource id
>> for get_device_system_crosststamp()" [3]. Pull [4] to get the combined
>> series on top of mainline.
>>
>> Overview
>> --------
>>
>> This patch series adds the virtio_rtc module, and related bugfixes. The
>> virtio_rtc module implements a driver compatible with the proposed Virtio
>> RTC device specification [1]. The Virtio RTC (Real Time Clock) device
>> provides information about current time. The device can provide different
>> clocks, e.g. for the UTC or TAI time standards, or for physical time
>> elapsed since some past epoch. The driver can read the clocks with simple
>> or more accurate methods. Optionally, the driver can set an alarm.
>>
>> The series first fixes some bugs in the get_device_system_crosststamp()
>> interpolation code, which is required for reliable virtio_rtc operation.
>> Then, add the virtio_rtc implementation.
>>
>> For the Virtio RTC device, there is currently a proprietary implementation,
>> which has been used for provisional testing.
> 
> As discussed before, I don't think it makes sense to design a new high-
> precision virtual clock which only gets it right *most* of the time. We
> absolutely need to address the issue of live migration.
> 
> When live migration occurs, the guest's time precision suffers in two
> ways.
> 
> First, even when migrating to a supposedly identical host the precise
> rate of the underlying counter (TSC, arch counter, etc.) can change
> within the tolerances (e.g. ±50PPM) of the hardware. Unlike the natural
> changes that NTP normally manages to track, this is a *step* change,
> potentially from -50PPM to +50PPM from one host to the next.
> 
> Second, the accuracy of the counter as preserved across migration is
> limited by the accuracy of each host's NTP synchronization. So there is
> also a step change in the value of the counter itself.
> 
> At the moment of migration, the guest's timekeeping should be
> considered invalid. Any previous cross-timestamps are meaningless, and
> blindly using the previously-known relationship between the counter and
> real time can lead to problems such as corruption in distributed
> databases, fines for mis-timestamped transactions, and other general
> unhappiness.
> 
> We obviously can't get a new timestamp from the virtio_rtc device every
> time an application wants to know the time reliably. We don't even want
> *system* calls for that, which is why we have it in a vDSO.
> 
> We can take the same approach to warning the guest about clock
> disruption due to live migration. A shared memory region from the
> virtual clock device even be mapped all the way to userspace, for those
> applications which need precise and *reliable* time to check a
> 'disruption' marker in it, and do whatever is appropriate (e.g. abort
> transactions and wait for time to resync) when it happens.
> 
> We can do better than just letting the guest know about disruption, of
> course. We can put the actual counter→realtime relationship into the
> memory region too. As well as being able to provide a PTP driver with
> this, the applications which care about *reliable* timestamps can mmap
> the page directly and use it, vDSO-style, to have accurate timestamps
> even from the first cycle after migration.
> 
> When disruption is signalled, the guest needs to throw away any
> *additional* refinement that it's done with NTP/PTP/PPS/etc. and revert
> to knowing nothing more than what the hypervisor advertises here.
> 
> Here's a first attempt at defining such a memory structure. For now
> I've done it as a "vmclock" ACPI device based loosely on vmgenid, but I
> think it makes most sense for this to be part of the virtio_rtc spec.
> Ultimately it doesn't matter *how* the guest finds the memory region.

This looks sensible to me. I also think it would be possible to adapt this for
the virtio-rtc spec. The proposal also supports some other use cases which are
not in the virtio-rtc RFC spec v4 [2], notably vDSO-style clock reading, and
others such as indication of a past leap second.

Compared to the virtio-rtc RFC spec v4 [2], this proposal does not seem to
support leap second smearing. Also, it would be helpful to allow indicating
when some of the fields are not valid (such as leapsecond_counter_value,
leapsecond_tai_time, tai_offset_sec, utc_time_maxerror_picosec, ...).

Do you have plans to contribute this to the Virtio specification and Linux
driver implementation?

I also added a few comments below.

Thanks for sharing,

Peter

[2] https://lore.kernel.org/virtio-comment/20240522170003.52565-1-peter.hilber@opensynergy.com/

> 
> Very preliminary QEMU hacks at 
> https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/vmclock
> (I still need to do the KVM side helper for actually filling in the
> host clock information, converted to the *guest* TSC)
> 
> This is the guest side. H aving heckled your assumption that we can use
> the virt counter on Arm, I concede I'm doing the same thing for now.
> The structure itself is at the end, or see
> https://git.infradead.org/?p=users/dwmw2/linux.git;a=blob;f=include/uapi/linux/vmclock.h;hb=vmclock
> 
> From 9e1c3b823d497efa4e0acb21b226a72e4d6e8a53 Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw@amazon.co.uk>
> Date: Mon, 10 Jun 2024 15:10:11 +0100
> Subject: [PATCH] ptp: Add vDSO-style vmclock support
> 
> The vmclock "device" provides a shared memory region with precision clock
> information. By using shared memory, it is safe across Live Migration.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  drivers/ptp/Kconfig          |  13 ++
>  drivers/ptp/Makefile         |   1 +
>  drivers/ptp/ptp_vmclock.c    | 404 +++++++++++++++++++++++++++++++++++
>  include/uapi/linux/vmclock.h | 102 +++++++++
>  4 files changed, 520 insertions(+)
>  create mode 100644 drivers/ptp/ptp_vmclock.c
>  create mode 100644 include/uapi/linux/vmclock.h
> 

[...]

> +
> +static const struct ptp_clock_info ptp_vmclock_info = {
> +	.owner		= THIS_MODULE,
> +	.max_adj	= 0,
> +	.n_ext_ts	= 0,
> +	.n_pins		= 0,
> +	.pps		= 0,
> +	.adjfine	= ptp_vmclock_adjfine,
> +	.adjtime	= ptp_vmclock_adjtime,
> +	.gettime64	= ptp_vmclock_gettime,

Should implement .gettimex64 instead.

> +	.settime64	= ptp_vmclock_settime,
> +	.enable		= ptp_vmclock_enable,
> +	.getcrosststamp = ptp_vmclock_getcrosststamp,
> +};
> +

[...]

> +/*
> + * This structure provides a vDSO-style clock to VM guests, exposing the
> + * relationship (or lack thereof) between the CPU clock (TSC, timebase, arch
> + * counter, etc.) and real time. It is designed to address the problem of
> + * live migration, which other clock enlightenments do not.
> + *
> + * When a guest is live migrated, this affects the clock in two ways.
> + *
> + * First, even between identical hosts the actual frequency of the underlying
> + * counter will change within the tolerances of its specification (typically
> + * ±50PPM, or 4 seconds a day). The frequency also varies over time on the
> + * same host, but can be tracked by NTP as it generally varies slowly. With
> + * live migration there is a step change in the frequency, with no warning.
> + *
> + * Second, there may be a step change in the value of the counter itself, as
> + * its accuracy is limited by the precision of the NTP synchronization on the
> + * source and destination hosts.
> + *
> + * So any calibration (NTP, PTP, etc.) which the guest has done on the source
> + * host before migration is invalid, and needs to be redone on the new host.
> + *
> + * In its most basic mode, this structure provides only an indication to the
> + * guest that live migration has occurred. This allows the guest to know that
> + * its clock is invalid and take remedial action. For applications that need
> + * reliable accurate timestamps (e.g. distributed databases), the structure
> + * can be mapped all the way to userspace. This allows the application to see
> + * directly for itself that the clock is disrupted and take appropriate
> + * action, even when using a vDSO-style method to get the time instead of a
> + * system call.
> + *
> + * In its more advanced mode. this structure can also be used to expose the
> + * precise relationship of the CPU counter to real time, as calibrated by the
> + * host. This means that userspace applications can have accurate time
> + * immediately after live migration, rather than having to pause operations
> + * and wait for NTP to recover. This mode does, of course, rely on the
> + * counter being reliable and consistent across CPUs.
> + */
> +
> +#ifdef __KERNEL__
> +#include <linux/types.h>
> +#else
> +#include <stdint.h>
> +#endif
> +
> +struct vmclock_abi {
> +	uint32_t magic;
> +#define VMCLOCK_MAGIC	0x4b4c4356 /* "VCLK" */
> +	uint16_t size;		/* Size of page containing this structure */
> +	uint16_t version;	/* 1 */
> +
> +	/* Sequence lock. Low bit means an update is in progress. */
> +	uint64_t seq_count;
> +
> +	/*
> +	 * This field changes to another non-repeating value when the CPU
> +	 * counter is disrupted, for example on live migration.
> +	 */
> +	uint64_t disruption_marker;
> +
> +	uint8_t clock_status;
> +#define VMCLOCK_STATUS_UNKNOWN		0
> +#define VMCLOCK_STATUS_INITIALIZING	1
> +#define VMCLOCK_STATUS_SYNCHRONIZED	2
> +#define VMCLOCK_STATUS_FREERUNNING	3
> +#define VMCLOCK_STATUS_UNRELIABLE	4
> +
> +	uint8_t counter_id;
> +#define VMCLOCK_COUNTER_INVALID		0
> +#define VMCLOCK_COUNTER_X86_TSC		1
> +#define VMCLOCK_COUNTER_ARM_VCNT	2
> +
> +	uint8_t pad[3];
> +
> +	/*
> +	 * UTC on its own is non-monotonic and ambiguous.
> +	 *
> +	 * Inform the guest about the TAI offset, so that it can have an
> +	 * actual monotonic and reliable time reference if it needs it.
> +	 *
> +	 * Also indicate a nearby leap second, if one exists. Unlike in
> +	 * NTP, this may indicate a leap second in the past in order to
> +	 * indicate that it *has* been taken into account.
> +	 */
> +	int8_t leapsecond_direction;
> +	int16_t tai_offset_sec;
> +	uint64_t leapsecond_counter_value;
> +	uint64_t leapsecond_tai_time;
> +
> +	/* Paired values of counter and UTC at a given point in time. */
> +	uint64_t counter_value;
> +	uint64_t utc_time_sec;
> +	uint64_t utc_time_frac_sec;
> +
> +	/* Counter frequency, and error margin. Units of (second >> 64) */
> +	uint64_t counter_period_frac_sec;

AFAIU this might limit the precision in case of high counter frequencies.
Could the unit be aligned to the expected frequency band of counters?

> +	uint64_t counter_period_error_rate_frac_sec;
> +
> +	/* Error margin of UTC reading above (± picoseconds) */
> +	uint64_t utc_time_maxerror_picosec;
> +};


  reply	other threads:[~2024-06-20 12:38 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-18  7:38 [RFC PATCH v3 0/7] Add virtio_rtc module and related changes Peter Hilber
2023-12-18  7:38 ` [virtio-dev] " Peter Hilber
2023-12-18  7:38 ` Peter Hilber
2023-12-18  7:38 ` [RFC PATCH v3 1/7] timekeeping: Fix cross-timestamp interpolation on counter wrap Peter Hilber
2024-02-19 16:35   ` [tip: timers/core] " tip-bot2 for Peter Hilber
2023-12-18  7:38 ` [RFC PATCH v3 2/7] timekeeping: Fix cross-timestamp interpolation corner case decision Peter Hilber
2024-02-19 16:35   ` [tip: timers/core] " tip-bot2 for Peter Hilber
2023-12-18  7:38 ` [RFC PATCH v3 3/7] timekeeping: Fix cross-timestamp interpolation for non-x86 Peter Hilber
2024-02-19 16:35   ` [tip: timers/core] " tip-bot2 for Peter Hilber
2023-12-18  7:38 ` [RFC PATCH v3 4/7] virtio_rtc: Add module and driver core Peter Hilber
2023-12-18  7:38   ` [virtio-dev] " Peter Hilber
2023-12-18  7:38   ` Peter Hilber
2023-12-18  7:38 ` [RFC PATCH v3 5/7] virtio_rtc: Add PTP clocks Peter Hilber
2023-12-18  7:38   ` [virtio-dev] " Peter Hilber
2023-12-18  7:38   ` Peter Hilber
2024-06-15  8:01   ` David Woodhouse
2024-06-20 12:01     ` Peter Hilber
2024-06-20 14:33       ` David Woodhouse
2023-12-18  7:38 ` [RFC PATCH v3 6/7] virtio_rtc: Add Arm Generic Timer cross-timestamping Peter Hilber
2023-12-18  7:38   ` [virtio-dev] " Peter Hilber
2023-12-18  7:38   ` Peter Hilber
2024-06-15  7:50   ` David Woodhouse
2024-06-20 12:06     ` Peter Hilber
2023-12-18  7:38 ` [RFC PATCH v3 7/7] virtio_rtc: Add RTC class driver Peter Hilber
2023-12-18  7:38   ` [virtio-dev] " Peter Hilber
2024-03-08 17:03   ` Alexandre Belloni
2024-03-11 18:28     ` Peter Hilber
2024-03-11 19:46       ` Alexandre Belloni
2024-03-13  9:13         ` Peter Hilber
2024-03-07 14:02 ` [RFC PATCH v3 0/7] Add virtio_rtc module and related changes David Woodhouse
2024-03-07 14:02   ` David Woodhouse
2024-03-08 10:32   ` Peter Hilber
2024-03-08 10:32     ` Peter Hilber
2024-03-08 12:33     ` David Woodhouse
2024-03-08 12:33       ` David Woodhouse
2024-03-11 18:24       ` Peter Hilber
2024-03-11 18:24         ` Peter Hilber
2024-03-12 17:15         ` David Woodhouse
2024-03-12 17:15           ` David Woodhouse
2024-03-13  9:45           ` Peter Hilber
2024-03-13  9:45             ` Peter Hilber
2024-03-13 11:18             ` Alexandre Belloni
2024-03-13 11:18               ` Alexandre Belloni
2024-03-13 12:29               ` David Woodhouse
2024-03-13 12:29                 ` David Woodhouse
2024-03-13 12:58                 ` Alexandre Belloni
2024-03-13 12:58                   ` Alexandre Belloni
2024-03-13 14:06                   ` David Woodhouse
2024-03-13 14:06                     ` David Woodhouse
2024-03-13 14:50                     ` Alexandre Belloni
2024-03-13 14:50                       ` Alexandre Belloni
2024-03-13 20:12                       ` Andrew Lunn
2024-03-13 20:12                         ` Andrew Lunn
2024-03-14  9:13                         ` Peter Hilber
2024-03-14  9:13                           ` Peter Hilber
2024-03-13 17:50                     ` Peter Hilber
2024-03-13 17:50                       ` Peter Hilber
2024-03-13 14:15               ` Peter Hilber
2024-03-13 14:15                 ` Peter Hilber
2024-03-13 12:45             ` David Woodhouse
2024-03-13 12:45               ` David Woodhouse
2024-03-13 17:50               ` Peter Hilber
2024-03-13 17:50                 ` Peter Hilber
2024-03-13 18:18                 ` David Woodhouse
2024-03-13 18:18                   ` David Woodhouse
2024-03-14 10:13                   ` Peter Hilber
2024-03-14 10:13                     ` Peter Hilber
2024-03-14 14:19                     ` David Woodhouse
2024-03-14 14:19                       ` David Woodhouse
2024-03-19 13:47                       ` Peter Hilber
2024-03-19 13:47                         ` Peter Hilber
2024-03-20 17:22                         ` David Woodhouse
2024-03-20 17:22                           ` David Woodhouse
2024-06-15  8:40 ` David Woodhouse
2024-06-20 12:37   ` Peter Hilber [this message]
2024-06-20 16:19     ` David Woodhouse
2024-06-21  8:45       ` David Woodhouse
2024-06-25 19:01         ` [RFC PATCH v2] ptp: Add vDSO-style vmclock support David Woodhouse
2024-06-25 21:34           ` Thomas Gleixner
2024-06-25 21:48             ` David Woodhouse
2024-06-25 22:22               ` John Stultz
2024-06-26  8:32                 ` David Woodhouse
2024-06-26 16:43             ` Richard Cochran
2024-06-21 14:02     ` [RFC PATCH v3 0/7] Add virtio_rtc module and related changes David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=671a784b-234f-4be6-80bf-5135e257ed40@opensynergy.com \
    --to=peter.hilber@opensynergy.com \
    --cc=a.zummo@towertech.it \
    --cc=alexandre.belloni@bootlin.com \
    --cc=christopher.s.hall@intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dwmw2@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=jstultz@google.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rtc@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=richardcochran@gmail.com \
    --cc=ridouxj@amazon.com \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=virtio-dev@lists.linux.dev \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.