From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932070AbbLAPFV (ORCPT <rfc822;w@1wt.eu>);
	Tue, 1 Dec 2015 10:05:21 -0500
Received: from mga03.intel.com ([134.134.136.65]:9010 "EHLO mga03.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1756212AbbLAPFQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Tue, 1 Dec 2015 10:05:16 -0500
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.20,369,1444719600"; 
   d="scan'208";a="862212822"
Subject: Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for
 SRIOV NIC
To: Alexander Duyck <alexander.duyck@gmail.com>
References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com>
 <5654722D.4010409@gmail.com> <56552888.90108@intel.com>
 <CAKgT0UcSrewenfM2YdVojHFFqfK2aVbBN5LH8=BFzc1p0f9hvQ@mail.gmail.com>
 <56556F98.5060507@intel.com>
 <CAKgT0UevBmRLpM1PvuuVDUW79A66RcPrfO8uGiqL1RiALW7apg@mail.gmail.com>
 <A12AC9D104E08D47BAF23C492F83C53B25CDE9E3@SHSMSX104.ccr.corp.intel.com>
 <CAKgT0Ucyq=7OSYBVvU9Z01b3f6scS+eBMLg+yphW7bwkNZosPQ@mail.gmail.com>
 <565BF285.4040507@intel.com>
 <CAKgT0Uc+-gFAetEfde6DmMOFK+vDE6UkgsNF8oLNqaQc4USSeg@mail.gmail.com>
Cc: "Dong, Eddie" <eddie.dong@intel.com>,
        "a.motakis@virtualopensystems.com" <a.motakis@virtualopensystems.com>,
        Alex Williamson <alex.williamson@redhat.com>,
        "b.reynal@virtualopensystems.com" <b.reynal@virtualopensystems.com>,
        Bjorn Helgaas <bhelgaas@google.com>,
        "Wyborny, Carolyn" <carolyn.wyborny@intel.com>,
        "Skidmore, Donald C" <donald.c.skidmore@intel.com>,
        "Jani, Nrupal" <nrupal.jani@intel.com>, Alexander Graf <agraf@suse.de>,
        "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
        Paolo Bonzini <pbonzini@redhat.com>,
        "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
        "Tantilov, Emil S" <emil.s.tantilov@intel.com>,
        Or Gerlitz <gerlitz.or@gmail.com>,
        "Rustad, Mark D" <mark.d.rustad@intel.com>,
        "Michael S. Tsirkin" <mst@redhat.com>,
        Eric Auger <eric.auger@linaro.org>,
        intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
        "Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>,
        "Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
        "Ronciak, John" <john.ronciak@intel.com>,
        "linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "Williams, Mitch A" <mitch.a.williams@intel.com>,
        Netdev <netdev@vger.kernel.org>,
        "Nelson, Shannon" <shannon.nelson@intel.com>,
        Wei Yang <weiyang@linux.vnet.ibm.com>,
        "zajec5@gmail.com" <zajec5@gmail.com>
From: "Lan, Tianyu" <tianyu.lan@intel.com>
Message-ID: <565DB6FF.1050602@intel.com>
Date: Tue, 1 Dec 2015 23:04:31 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <CAKgT0Uc+-gFAetEfde6DmMOFK+vDE6UkgsNF8oLNqaQc4USSeg@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> They can only be corrected if the underlying assumptions are correct
> and they aren't.  Your solution would have never worked correctly.
> The problem is you assume you can keep the device running when you are
> migrating and you simply cannot.  At some point you will always have
> to stop the device in order to complete the migration, and you cannot
> stop it before you have stopped your page tracking mechanism.  So
> unless the platform has an IOMMU that is somehow taking part in the
> dirty page tracking you will not be able to stop the guest and then
> the device, it will have to be the device and then the guest.
>
>> >Doing suspend and resume() may help to do migration easily but some
>> >devices requires low service down time. Especially network and I got
>> >that some cloud company promised less than 500ms network service downtime.
> Honestly focusing on the downtime is getting the cart ahead of the
> horse.  First you need to be able to do this without corrupting system
> memory and regardless of the state of the device.  You haven't even
> gotten to that state yet.  Last I knew the device had to be up in
> order for your migration to even work.

I think the issue is that the content of rx package delivered to stack 
maybe changed during migration because the piece of memory won't be 
migrated to new machine. This may confuse applications or stack. Current 
dummy write solution can ensure the content of package won't change 
after doing dummy write while the content maybe not received data if 
migration happens before that point. We can recheck the content via 
checksum or crc in the protocol after dummy write to ensure the content 
is what VF received. I think stack has already done such checks and the 
package will be abandoned if failed to pass through the check.

Another way is to tell all memory driver are using to Qemu and let Qemu 
to migrate these memory after stopping VCPU and the device. This seems 
safe but implementation maybe complex.

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:54084)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <tianyu.lan@intel.com>) id 1a3mUz-0003ho-Kn
	for qemu-devel@nongnu.org; Tue, 01 Dec 2015 10:05:51 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <tianyu.lan@intel.com>) id 1a3mUv-0006fP-Gm
	for qemu-devel@nongnu.org; Tue, 01 Dec 2015 10:05:45 -0500
Received: from mga01.intel.com ([192.55.52.88]:16634)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <tianyu.lan@intel.com>) id 1a3mUv-0006fD-77
	for qemu-devel@nongnu.org; Tue, 01 Dec 2015 10:05:41 -0500
References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com>
	<5654722D.4010409@gmail.com> <56552888.90108@intel.com>
	<CAKgT0UcSrewenfM2YdVojHFFqfK2aVbBN5LH8=BFzc1p0f9hvQ@mail.gmail.com>
	<56556F98.5060507@intel.com>
	<CAKgT0UevBmRLpM1PvuuVDUW79A66RcPrfO8uGiqL1RiALW7apg@mail.gmail.com>
	<A12AC9D104E08D47BAF23C492F83C53B25CDE9E3@SHSMSX104.ccr.corp.intel.com>
	<CAKgT0Ucyq=7OSYBVvU9Z01b3f6scS+eBMLg+yphW7bwkNZosPQ@mail.gmail.com>
	<565BF285.4040507@intel.com>
	<CAKgT0Uc+-gFAetEfde6DmMOFK+vDE6UkgsNF8oLNqaQc4USSeg@mail.gmail.com>
From: "Lan, Tianyu" <tianyu.lan@intel.com>
Message-ID: <565DB6FF.1050602@intel.com>
Date: Tue, 1 Dec 2015 23:04:31 +0800
MIME-Version: 1.0
In-Reply-To: <CAKgT0Uc+-gFAetEfde6DmMOFK+vDE6UkgsNF8oLNqaQc4USSeg@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration
 support for SRIOV NIC
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>, "Tantilov,
	Emil S" <emil.s.tantilov@intel.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "Brandeburg,
	Jesse" <jesse.brandeburg@intel.com>, "Rustad,
	Mark D" <mark.d.rustad@intel.com>, "Wyborny,
	Carolyn" <carolyn.wyborny@intel.com>, Eric Auger <eric.auger@linaro.org>, "Skidmore,
	Donald C" <donald.c.skidmore@intel.com>, "zajec5@gmail.com" <zajec5@gmail.com>, Alexander Graf <agraf@suse.de>, intel-wired-lan <intel-wired-lan@lists.osuosl.org>, "Kirsher,
	Jeffrey T" <jeffrey.t.kirsher@intel.com>, Or Gerlitz <gerlitz.or@gmail.com>, "Williams,
	Mitch A" <mitch.a.williams@intel.com>, "Jani,
	Nrupal" <nrupal.jani@intel.com>, Bjorn Helgaas <bhelgaas@google.com>, "a.motakis@virtualopensystems.com" <a.motakis@virtualopensystems.com>, "b.reynal@virtualopensystems.com" <b.reynal@virtualopensystems.com>, "linux-api@vger.kernel.org" <linux-api@vger.kernel.org>, "Nelson,
	Shannon" <shannon.nelson@intel.com>, "Dong, Eddie" <eddie.dong@intel.com>, Alex Williamson <alex.williamson@redhat.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "Ronciak, John" <john.ronciak@intel.com>, Netdev <netdev@vger.kernel.org>, Paolo Bonzini <pbonzini@redhat.com>


On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> They can only be corrected if the underlying assumptions are correct
> and they aren't.  Your solution would have never worked correctly.
> The problem is you assume you can keep the device running when you are
> migrating and you simply cannot.  At some point you will always have
> to stop the device in order to complete the migration, and you cannot
> stop it before you have stopped your page tracking mechanism.  So
> unless the platform has an IOMMU that is somehow taking part in the
> dirty page tracking you will not be able to stop the guest and then
> the device, it will have to be the device and then the guest.
>
>> >Doing suspend and resume() may help to do migration easily but some
>> >devices requires low service down time. Especially network and I got
>> >that some cloud company promised less than 500ms network service downtime.
> Honestly focusing on the downtime is getting the cart ahead of the
> horse.  First you need to be able to do this without corrupting system
> memory and regardless of the state of the device.  You haven't even
> gotten to that state yet.  Last I knew the device had to be up in
> order for your migration to even work.

I think the issue is that the content of rx package delivered to stack 
maybe changed during migration because the piece of memory won't be 
migrated to new machine. This may confuse applications or stack. Current 
dummy write solution can ensure the content of package won't change 
after doing dummy write while the content maybe not received data if 
migration happens before that point. We can recheck the content via 
checksum or crc in the protocol after dummy write to ensure the content 
is what VF received. I think stack has already done such checks and the 
package will be abandoned if failed to pass through the check.

Another way is to tell all memory driver are using to Qemu and let Qemu 
to migrate these memory after stopping VCPU and the device. This seems 
safe but implementation maybe complex.

From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Lan, Tianyu" <tianyu.lan@intel.com>
Subject: Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for
 SRIOV NIC
Date: Tue, 1 Dec 2015 23:04:31 +0800
Message-ID: <565DB6FF.1050602@intel.com>
References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com>
 <5654722D.4010409@gmail.com> <56552888.90108@intel.com>
 <CAKgT0UcSrewenfM2YdVojHFFqfK2aVbBN5LH8=BFzc1p0f9hvQ@mail.gmail.com>
 <56556F98.5060507@intel.com>
 <CAKgT0UevBmRLpM1PvuuVDUW79A66RcPrfO8uGiqL1RiALW7apg@mail.gmail.com>
 <A12AC9D104E08D47BAF23C492F83C53B25CDE9E3@SHSMSX104.ccr.corp.intel.com>
 <CAKgT0Ucyq=7OSYBVvU9Z01b3f6scS+eBMLg+yphW7bwkNZosPQ@mail.gmail.com>
 <565BF285.4040507@intel.com>
 <CAKgT0Uc+-gFAetEfde6DmMOFK+vDE6UkgsNF8oLNqaQc4USSeg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "Dong, Eddie" <eddie.dong@intel.com>,
	"a.motakis@virtualopensystems.com" <a.motakis@virtualopensystems.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"b.reynal@virtualopensystems.com" <b.reynal@virtualopensystems.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Wyborny, Carolyn" <carolyn.wyborny@intel.com>,
	"Skidmore, Donald C" <donald.c.skidmore@intel.com>,
	"Jani, Nrupal" <nrupal.jani@intel.com>,
	Alexander Graf <agraf@suse.de>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Tantilov, Emil S" <emil.s.tantilov@intel.com>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	"Rustad, Mark D" <mark.d.rustad@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Eric Auger <eric.auger@linaro.org>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>,
	"Brande
To: Alexander Duyck <alexander.duyck@gmail.com>
Return-path: <linux-kernel-owner@vger.kernel.org>
In-Reply-To: <CAKgT0Uc+-gFAetEfde6DmMOFK+vDE6UkgsNF8oLNqaQc4USSeg@mail.gmail.com>
Sender: linux-kernel-owner@vger.kernel.org
List-Id: netdev.vger.kernel.org


On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> They can only be corrected if the underlying assumptions are correct
> and they aren't.  Your solution would have never worked correctly.
> The problem is you assume you can keep the device running when you are
> migrating and you simply cannot.  At some point you will always have
> to stop the device in order to complete the migration, and you cannot
> stop it before you have stopped your page tracking mechanism.  So
> unless the platform has an IOMMU that is somehow taking part in the
> dirty page tracking you will not be able to stop the guest and then
> the device, it will have to be the device and then the guest.
>
>> >Doing suspend and resume() may help to do migration easily but some
>> >devices requires low service down time. Especially network and I got
>> >that some cloud company promised less than 500ms network service downtime.
> Honestly focusing on the downtime is getting the cart ahead of the
> horse.  First you need to be able to do this without corrupting system
> memory and regardless of the state of the device.  You haven't even
> gotten to that state yet.  Last I knew the device had to be up in
> order for your migration to even work.

I think the issue is that the content of rx package delivered to stack 
maybe changed during migration because the piece of memory won't be 
migrated to new machine. This may confuse applications or stack. Current 
dummy write solution can ensure the content of package won't change 
after doing dummy write while the content maybe not received data if 
migration happens before that point. We can recheck the content via 
checksum or crc in the protocol after dummy write to ensure the content 
is what VF received. I think stack has already done such checks and the 
package will be abandoned if failed to pass through the check.

Another way is to tell all memory driver are using to Qemu and let Qemu 
to migrate these memory after stopping VCPU and the device. This seems 
safe but implementation maybe complex.

From mboxrd@z Thu Jan  1 00:00:00 1970
From: Lan, Tianyu <tianyu.lan@intel.com>
Date: Tue, 1 Dec 2015 23:04:31 +0800
Subject: [Intel-wired-lan] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live
 migration support for SRIOV NIC
In-Reply-To: <CAKgT0Uc+-gFAetEfde6DmMOFK+vDE6UkgsNF8oLNqaQc4USSeg@mail.gmail.com>
References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com>
 <5654722D.4010409@gmail.com> <56552888.90108@intel.com>
 <CAKgT0UcSrewenfM2YdVojHFFqfK2aVbBN5LH8=BFzc1p0f9hvQ@mail.gmail.com>
 <56556F98.5060507@intel.com>
 <CAKgT0UevBmRLpM1PvuuVDUW79A66RcPrfO8uGiqL1RiALW7apg@mail.gmail.com>
 <A12AC9D104E08D47BAF23C492F83C53B25CDE9E3@SHSMSX104.ccr.corp.intel.com>
 <CAKgT0Ucyq=7OSYBVvU9Z01b3f6scS+eBMLg+yphW7bwkNZosPQ@mail.gmail.com>
 <565BF285.4040507@intel.com>
 <CAKgT0Uc+-gFAetEfde6DmMOFK+vDE6UkgsNF8oLNqaQc4USSeg@mail.gmail.com>
Message-ID: <565DB6FF.1050602@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: intel-wired-lan@osuosl.org
List-ID: <intel-wired-lan.osuosl.org>


On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> They can only be corrected if the underlying assumptions are correct
> and they aren't.  Your solution would have never worked correctly.
> The problem is you assume you can keep the device running when you are
> migrating and you simply cannot.  At some point you will always have
> to stop the device in order to complete the migration, and you cannot
> stop it before you have stopped your page tracking mechanism.  So
> unless the platform has an IOMMU that is somehow taking part in the
> dirty page tracking you will not be able to stop the guest and then
> the device, it will have to be the device and then the guest.
>
>> >Doing suspend and resume() may help to do migration easily but some
>> >devices requires low service down time. Especially network and I got
>> >that some cloud company promised less than 500ms network service downtime.
> Honestly focusing on the downtime is getting the cart ahead of the
> horse.  First you need to be able to do this without corrupting system
> memory and regardless of the state of the device.  You haven't even
> gotten to that state yet.  Last I knew the device had to be up in
> order for your migration to even work.

I think the issue is that the content of rx package delivered to stack 
maybe changed during migration because the piece of memory won't be 
migrated to new machine. This may confuse applications or stack. Current 
dummy write solution can ensure the content of package won't change 
after doing dummy write while the content maybe not received data if 
migration happens before that point. We can recheck the content via 
checksum or crc in the protocol after dummy write to ensure the content 
is what VF received. I think stack has already done such checks and the 
package will be abandoned if failed to pass through the check.

Another way is to tell all memory driver are using to Qemu and let Qemu 
to migrate these memory after stopping VCPU and the device. This seems 
safe but implementation maybe complex.