QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
To: Peter Xu <peterx@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Avihai Horon" <avihaih@nvidia.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer
Date: Fri, 26 Apr 2024 19:34:09 +0200	[thread overview]
Message-ID: <1a0b3c24-fffd-4db3-a35e-e40ae2e0a074@maciej.szmigiero.name> (raw)
In-Reply-To: <Zig3vebacR4SfJLh@x1n>

On 24.04.2024 00:35, Peter Xu wrote:
> On Wed, Apr 24, 2024 at 12:25:08AM +0200, Maciej S. Szmigiero wrote:
>> On 24.04.2024 00:20, Peter Xu wrote:
>>> On Tue, Apr 23, 2024 at 06:15:35PM +0200, Maciej S. Szmigiero wrote:
>>>> On 19.04.2024 17:31, Peter Xu wrote:
>>>>> On Fri, Apr 19, 2024 at 11:07:21AM +0100, Daniel P. Berrangé wrote:
>>>>>> On Thu, Apr 18, 2024 at 04:02:49PM -0400, Peter Xu wrote:
>>>>>>> On Thu, Apr 18, 2024 at 08:14:15PM +0200, Maciej S. Szmigiero wrote:
>>>>>>>> I think one of the reasons for these results is that mixed (RAM + device
>>>>>>>> state) multifd channels participate in the RAM sync process
>>>>>>>> (MULTIFD_FLAG_SYNC) whereas device state dedicated channels don't.
>>>>>>>
>>>>>>> Firstly, I'm wondering whether we can have better names for these new
>>>>>>> hooks.  Currently (only comment on the async* stuff):
>>>>>>>
>>>>>>>      - complete_precopy_async
>>>>>>>      - complete_precopy
>>>>>>>      - complete_precopy_async_wait
>>>>>>>
>>>>>>> But perhaps better:
>>>>>>>
>>>>>>>      - complete_precopy_begin
>>>>>>>      - complete_precopy
>>>>>>>      - complete_precopy_end
>>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>> As I don't see why the device must do something with async in such hook.
>>>>>>> To me it's more like you're splitting one process into multiple, then
>>>>>>> begin/end sounds more generic.
>>>>>>>
>>>>>>> Then, if with that in mind, IIUC we can already split ram_save_complete()
>>>>>>> into >1 phases too. For example, I would be curious whether the performance
>>>>>>> will go back to normal if we offloading multifd_send_sync_main() into the
>>>>>>> complete_precopy_end(), because we really only need one shot of that, and I
>>>>>>> am quite surprised it already greatly affects VFIO dumping its own things.
>>>>>>>
>>>>>>> I would even ask one step further as what Dan was asking: have you thought
>>>>>>> about dumping VFIO states via multifd even during iterations?  Would that
>>>>>>> help even more than this series (which IIUC only helps during the blackout
>>>>>>> phase)?
>>>>>>
>>>>>> To dump during RAM iteration, the VFIO device will need to have
>>>>>> dirty tracking and iterate on its state, because the guest CPUs
>>>>>> will still be running potentially changing VFIO state. That seems
>>>>>> impractical in the general case.
>>>>>
>>>>> We already do such interations in vfio_save_iterate()?
>>>>>
>>>>> My understanding is the recent VFIO work is based on the fact that the VFIO
>>>>> device can track device state changes more or less (besides being able to
>>>>> save/load full states).  E.g. I still remember in our QE tests some old
>>>>> devices report much more dirty pages than expected during the iterations
>>>>> when we were looking into such issue that a huge amount of dirty pages
>>>>> reported.  But newer models seem to have fixed that and report much less.
>>>>>
>>>>> That issue was about GPU not NICs, though, and IIUC a major portion of such
>>>>> tracking used to be for GPU vRAMs.  So maybe I was mixing up these, and
>>>>> maybe they work differently.
>>>>
>>>> The device which this series was developed against (Mellanox ConnectX-7)
>>>> is already transferring its live state before the VM gets stopped (via
>>>> save_live_iterate SaveVMHandler).
>>>>
>>>> It's just that in addition to the live state it has more than 400 MiB
>>>> of state that cannot be transferred while the VM is still running.
>>>> And that fact hurts a lot with respect to the migration downtime.
>>>>
>>>> AFAIK it's a very similar story for (some) GPUs.
>>>
>>> So during iteration phase VFIO cannot yet leverage the multifd channels
>>> when with this series, am I right?
>>
>> That's right.
>>
>>> Is it possible to extend that use case too?
>>
>> I guess so, but since this phase (iteration while the VM is still
>> running) doesn't impact downtime it is much less critical.
> 
> But it affects the bandwidth, e.g. even with multifd enabled, the device
> iteration data will still bottleneck at ~15Gbps on a common system setup
> the best case, even if the hosts are 100Gbps direct connected.  Would that
> be a concern in the future too, or it's known problem and it won't be fixed
> anyway?

I think any improvements to the migration performance are good, even if
they don't impact downtime.

It's just that this patch set focuses on the downtime phase as the more
critical thing.

After this gets improved there's no reason why not to look at improving
performance of the VM live phase too if it brings sensible improvements.

> I remember Avihai used to have plan to look into similar issues, I hope
> this is exactly what he is looking for.  Otherwise changing migration
> protocol from time to time is cumbersome; we always need to provide a flag
> to make sure old systems migrates in the old ways, new systems run the new
> ways, and for such a relatively major change I'd want to double check on
> how far away we can support offload VFIO iterations data to multifd.

The device state transfer is indicated by a new flag in the multifd
header (MULTIFD_FLAG_DEVICE_STATE).

If we are to use multifd channels for VM live phase transfers these
could simply re-use the same flag type.

> Thanks,
> 

Thanks,
Maciej



  reply	other threads:[~2024-04-26 17:34 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-16 14:42 [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 01/26] migration: Add x-channel-header pseudo-capability Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 02/26] migration: Add migration channel header send/receive Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 03/26] migration: Add send/receive header for main channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 04/26] multifd: change multifd_new_send_channel_create() param type Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 05/26] migration: Add a DestroyNotify parameter to socket_send_channel_create() Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 06/26] multifd: pass MFDSendChannelConnectData when connecting sending socket Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 07/26] migration/postcopy: pass PostcopyPChannelConnectData when connecting sending preempt socket Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 08/26] migration: Allow passing migration header in migration channel creation Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 09/26] migration: Add send/receive header for postcopy preempt channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 10/26] migration: Add send/receive header for multifd channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 11/26] migration/options: Mapped-ram is not channel header compatible Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 12/26] migration: Enable x-channel-header pseudo-capability Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 13/26] vfio/migration: Add save_{iterate, complete_precopy}_started trace events Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 14/26] migration/ram: Add load start trace event Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 15/26] migration/multifd: Zero p->flags before starting filling a packet Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 16/26] migration: Add save_live_complete_precopy_async{, wait} handlers Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 17/26] migration: Add qemu_loadvm_load_state_buffer() and its handler Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 18/26] migration: Add load_finish handler and associated functions Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 19/26] migration: Add x-multifd-channels-device-state parameter Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 20/26] migration: Add MULTIFD_DEVICE_STATE migration channel type Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 21/26] migration/multifd: Device state transfer support - receive side Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 22/26] migration/multifd: Convert multifd_send_pages::next_channel to atomic Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 23/26] migration/multifd: Device state transfer support - send side Maciej S. Szmigiero
2024-04-29 20:04   ` Peter Xu
2024-05-06 16:25     ` Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 24/26] migration/multifd: Add migration_has_device_state_support() Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 25/26] vfio/migration: Multifd device state transfer support - receive side Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 26/26] vfio/migration: Multifd device state transfer support - send side Maciej S. Szmigiero
2024-04-17  8:36 ` [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer Daniel P. Berrangé
2024-04-17 12:11   ` Maciej S. Szmigiero
2024-04-17 16:35     ` Daniel P. Berrangé
2024-04-18  9:50       ` Maciej S. Szmigiero
2024-04-18 10:39         ` Daniel P. Berrangé
2024-04-18 18:14           ` Maciej S. Szmigiero
2024-04-18 20:02             ` Peter Xu
2024-04-19 10:07               ` Daniel P. Berrangé
2024-04-19 15:31                 ` Peter Xu
2024-04-23 16:15                   ` Maciej S. Szmigiero
2024-04-23 22:20                     ` Peter Xu
2024-04-23 22:25                       ` Maciej S. Szmigiero
2024-04-23 22:35                         ` Peter Xu
2024-04-26 17:34                           ` Maciej S. Szmigiero [this message]
2024-04-29 15:09                             ` Peter Xu
2024-05-06 16:26                               ` Maciej S. Szmigiero
2024-05-06 17:56                                 ` Peter Xu
2024-05-07  8:41                                   ` Avihai Horon
2024-05-07 16:13                                     ` Peter Xu
2024-05-07 17:23                                       ` Avihai Horon
2024-04-23 16:14               ` Maciej S. Szmigiero
2024-04-23 22:27                 ` Peter Xu
2024-04-26 17:35                   ` Maciej S. Szmigiero
2024-04-29 20:34                     ` Peter Xu
2024-04-19 10:20             ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a0b3c24-fffd-4db3-a35e-e40ae2e0a074@maciej.szmigiero.name \
    --to=mail@maciej.szmigiero.name \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=joao.m.martins@oracle.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).