From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37116)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Z60Sx-00075i-JR
	for qemu-devel@nongnu.org; Fri, 19 Jun 2015 13:52:36 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Z60Ss-00051r-Ip
	for qemu-devel@nongnu.org; Fri, 19 Jun 2015 13:52:35 -0400
Received: from mx1.redhat.com ([209.132.183.28]:58849)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1Z60Ss-00051J-Bh
	for qemu-devel@nongnu.org; Fri, 19 Jun 2015 13:52:30 -0400
Date: Fri, 19 Jun 2015 18:52:24 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20150619175223.GK2147@work-vm>
References: <1434450415-11339-1-git-send-email-dgilbert@redhat.com>
	<1434450415-11339-2-git-send-email-dgilbert@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E55056F@shsmsx102.ccr.corp.intel.com>
	<55828133.90400@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <55828133.90400@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy
	works.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "aarcange@redhat.com" <aarcange@redhat.com>, "yamahata@private.email.ne.jp" <yamahata@private.email.ne.jp>, "quintela@redhat.com" <quintela@redhat.com>, "Li,
	Liang Z" <liang.z.li@intel.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "luis@cs.umu.se" <luis@cs.umu.se>, "amit.shah@redhat.com" <amit.shah@redhat.com>, "david@gibson.dropbear.id.au" <david@gibson.dropbear.id.au>

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 18/06/2015 09:50, Li, Liang Z wrote:
> > Do you have any idea or plan to deal with the failure happened during
> > the postcopy phase?
> > 
> > Lost the guest  is too frightening for a cloud provider, we have a
> > discussion with Alibaba, they said that they can't use the postcopy
> > feature unless there is a mechanism to find the guest back.
> 
> There's no solution to this problem, except for rollback to a previous
> snapshot.

Yes, and you might be able to avoid some of the pain if you COWd the
disk data on the destination until the migration was finished; that would
allow you to restart the source VM in the state prior to postcopy starting;
although the network's view of it is going to be very messy.

> To give an idea, an example of an intended usecase for postcopy is
> datacenter evacuation in 30 minutes after a tsunami alert.  That's not a
> case where you care much about losing guests to network failures.

Well; you have to make a call as to what your best option is;  you could
always shut the VM down and boot it up fresh in your new safe data centre.
Your preference is determined by your confidence that your VM would boot
back up safely and how long it would take and the confidence in that network
during the migration period and the pain of knowing what will happen
if you explicitly shut the VM down.

> Cloud operators can use a combination of precopy and postcopy.  For
> example, I would not use postcopy for mass migration when doing
> host updates, but it can be used as a last resort before a scheduled
> downtime.
> 
> For example, say you're doing a rolling update and you want it complete
> by next Sunday.  90% of the guests are shut down by the customers or can
> be migrated successfully with precopy.  The others do not converge and
> their SLA does not let you throttle them to complete precopy migration.

Indeed the interface lets you do that pretty easily; since as long as you
have enabled postcopy, it starts in precopy mode and is fully recoverable
until you issue the 'migrate_start_postcopy' which might be when it's
tried 'n' times and you can see that the workload you have isn't going
to converge.

Dave

> You then tell your customers that either they shutdown and restart their
> instances before Saturday 8:00 PM, or they might be shut down forcibly.
>  Then for customers who haven't rebooted you can do
> postcopy---you have alerted them that something might go wrong.  So even
> though postcopy would not be a first choice, it can still help cloud
> operators.
> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK