From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753137AbbKYQYm (ORCPT ); Wed, 25 Nov 2015 11:24:42 -0500 Received: from mail-ig0-f193.google.com ([209.85.213.193]:34723 "EHLO mail-ig0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752752AbbKYQYj (ORCPT ); Wed, 25 Nov 2015 11:24:39 -0500 MIME-Version: 1.0 In-Reply-To: <5655DB99.3040007@intel.com> References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> Date: Wed, 25 Nov 2015 08:24:38 -0800 Message-ID: Subject: Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver From: Alexander Duyck To: "Lan, Tianyu" Cc: "Michael S. Tsirkin" , a.motakis@virtualopensystems.com, Alex Williamson , b.reynal@virtualopensystems.com, Bjorn Helgaas , Carolyn Wyborny , "Skidmore, Donald C" , eddie.dong@intel.com, nrupal.jani@intel.com, Alexander Graf , kvm@vger.kernel.org, Paolo Bonzini , qemu-devel@nongnu.org, "Tantilov, Emil S" , Or Gerlitz , "Rustad, Mark D" , Eric Auger , intel-wired-lan , Jeff Kirsher , "Brandeburg, Jesse" , "Ronciak, John" , linux-api@vger.kernel.org, "linux-kernel@vger.kernel.org" , "Vick, Matthew" , Mitch Williams , Netdev , "Nelson, Shannon" , Wei Yang , zajec5@gmail.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 25, 2015 at 8:02 AM, Lan, Tianyu wrote: > On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote: >> >> Frankly, I don't really see what this short term hack buys us, >> and if it goes in, we'll have to maintain it forever. >> > > The framework of how to notify VF about migration status won't be > changed regardless of stopping VF or not before doing migration. > We hope to reach agreement on this first. Tracking dirty memory still > need to more discussions and we will continue working on it. Stop VF may > help to work around the issue and make tracking easier. The problem is you still have to stop the device at some point for the same reason why you have to halt the VM. You seem to think you can get by without doing that but you can't. All you do is open the system up to multiple races if you leave the device running. The goal should be to avoid stopping the device until the last possible moment, however it will still have to be stopped eventually. It isn't as if you can migrate memory and leave the device doing DMA and expect to get a clean state. I agree with Michael. The focus needs to be on first addressing dirty page tracking. Once you have that you could use a variation on the bonding solution where you postpone the hot-plug event until near the end of the migration just before you halt the guest instead of having to do it before you start the migration. Then after that we could look at optimizing things further by introducing a variation that you could further improve on things by introducing a variation of hot-plug that would pause the device as I suggested instead of removing it. At that point you should be able to have almost all of the key issues addresses so that you could drop the bond interface entirely. >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> easy enough to do using a guest agent, in a completely generic way. >> > > Just ifdown/ifup is not enough for migration. It needs to restore some PCI > settings before doing ifup on the target machine That is why I have been suggesting making use of suspend/resume logic that is already in place for PCI power management. In the case of a suspend/resume we already have to deal with the fact that the device will go through a D0->D3->D0 reset so we have to restore all of the existing state. It would take a significant load off of Qemu since the guest would be restoring its own state instead of making Qemu have to do all of the device migration work. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55463) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1cs4-00057c-BT for qemu-devel@nongnu.org; Wed, 25 Nov 2015 11:24:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a1cs3-00082k-62 for qemu-devel@nongnu.org; Wed, 25 Nov 2015 11:24:40 -0500 Received: from mail-ig0-x244.google.com ([2607:f8b0:4001:c05::244]:34378) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1cs3-00082g-1f for qemu-devel@nongnu.org; Wed, 25 Nov 2015 11:24:39 -0500 Received: by igbxf8 with SMTP id xf8so5226436igb.1 for ; Wed, 25 Nov 2015 08:24:38 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <5655DB99.3040007@intel.com> References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> Date: Wed, 25 Nov 2015 08:24:38 -0800 Message-ID: From: Alexander Duyck Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Lan, Tianyu" Cc: Wei Yang , "Tantilov, Emil S" , kvm@vger.kernel.org, "Michael S. Tsirkin" , qemu-devel@nongnu.org, "Brandeburg, Jesse" , "Rustad, Mark D" , Carolyn Wyborny , Eric Auger , "Skidmore, Donald C" , zajec5@gmail.com, Alexander Graf , "Vick, Matthew" , intel-wired-lan , Jeff Kirsher , Or Gerlitz , Mitch Williams , nrupal.jani@intel.com, Bjorn Helgaas , a.motakis@virtualopensystems.com, b.reynal@virtualopensystems.com, linux-api@vger.kernel.org, "Nelson, Shannon" , eddie.dong@intel.com, Alex Williamson , "linux-kernel@vger.kernel.org" , "Ronciak, John" , Netdev , Paolo Bonzini On Wed, Nov 25, 2015 at 8:02 AM, Lan, Tianyu wrote: > On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote: >> >> Frankly, I don't really see what this short term hack buys us, >> and if it goes in, we'll have to maintain it forever. >> > > The framework of how to notify VF about migration status won't be > changed regardless of stopping VF or not before doing migration. > We hope to reach agreement on this first. Tracking dirty memory still > need to more discussions and we will continue working on it. Stop VF may > help to work around the issue and make tracking easier. The problem is you still have to stop the device at some point for the same reason why you have to halt the VM. You seem to think you can get by without doing that but you can't. All you do is open the system up to multiple races if you leave the device running. The goal should be to avoid stopping the device until the last possible moment, however it will still have to be stopped eventually. It isn't as if you can migrate memory and leave the device doing DMA and expect to get a clean state. I agree with Michael. The focus needs to be on first addressing dirty page tracking. Once you have that you could use a variation on the bonding solution where you postpone the hot-plug event until near the end of the migration just before you halt the guest instead of having to do it before you start the migration. Then after that we could look at optimizing things further by introducing a variation that you could further improve on things by introducing a variation of hot-plug that would pause the device as I suggested instead of removing it. At that point you should be able to have almost all of the key issues addresses so that you could drop the bond interface entirely. >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> easy enough to do using a guest agent, in a completely generic way. >> > > Just ifdown/ifup is not enough for migration. It needs to restore some PCI > settings before doing ifup on the target machine That is why I have been suggesting making use of suspend/resume logic that is already in place for PCI power management. In the case of a suspend/resume we already have to deal with the fact that the device will go through a D0->D3->D0 reset so we have to restore all of the existing state. It would take a significant load off of Qemu since the guest would be restoring its own state instead of making Qemu have to do all of the device migration work. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver Date: Wed, 25 Nov 2015 08:24:38 -0800 Message-ID: References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: "Michael S. Tsirkin" , a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org, Alex Williamson , b.reynal-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org, Bjorn Helgaas , Carolyn Wyborny , "Skidmore, Donald C" , eddie.dong-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org, nrupal.jani-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org, Alexander Graf , kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Paolo Bonzini , qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org, "Tantilov, Emil S" , Or Gerlitz , "Rustad, Mark D" , Eric Auger , intel-wired-lan , Jeff Kirsher , "Brandeburg, Jesse" , "Ronciak, John" , linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" Return-path: In-Reply-To: <5655DB99.3040007-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On Wed, Nov 25, 2015 at 8:02 AM, Lan, Tianyu wrote: > On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote: >> >> Frankly, I don't really see what this short term hack buys us, >> and if it goes in, we'll have to maintain it forever. >> > > The framework of how to notify VF about migration status won't be > changed regardless of stopping VF or not before doing migration. > We hope to reach agreement on this first. Tracking dirty memory still > need to more discussions and we will continue working on it. Stop VF may > help to work around the issue and make tracking easier. The problem is you still have to stop the device at some point for the same reason why you have to halt the VM. You seem to think you can get by without doing that but you can't. All you do is open the system up to multiple races if you leave the device running. The goal should be to avoid stopping the device until the last possible moment, however it will still have to be stopped eventually. It isn't as if you can migrate memory and leave the device doing DMA and expect to get a clean state. I agree with Michael. The focus needs to be on first addressing dirty page tracking. Once you have that you could use a variation on the bonding solution where you postpone the hot-plug event until near the end of the migration just before you halt the guest instead of having to do it before you start the migration. Then after that we could look at optimizing things further by introducing a variation that you could further improve on things by introducing a variation of hot-plug that would pause the device as I suggested instead of removing it. At that point you should be able to have almost all of the key issues addresses so that you could drop the bond interface entirely. >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> easy enough to do using a guest agent, in a completely generic way. >> > > Just ifdown/ifup is not enough for migration. It needs to restore some PCI > settings before doing ifup on the target machine That is why I have been suggesting making use of suspend/resume logic that is already in place for PCI power management. In the case of a suspend/resume we already have to deal with the fact that the device will go through a D0->D3->D0 reset so we have to restore all of the existing state. It would take a significant load off of Qemu since the guest would be restoring its own state instead of making Qemu have to do all of the device migration work. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver Date: Wed, 25 Nov 2015 08:24:38 -0800 Message-ID: References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: In-Reply-To: <5655DB99.3040007-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: "Lan, Tianyu" Cc: "Michael S. Tsirkin" , a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org, Alex Williamson , b.reynal-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org, Bjorn Helgaas , Carolyn Wyborny , "Skidmore, Donald C" , eddie.dong-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org, nrupal.jani-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org, Alexander Graf , kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Paolo Bonzini , qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org, "Tantilov, Emil S" , Or Gerlitz , "Rustad, Mark D" , Eric Auger , intel-wired-lan , Jeff Kirsher , "Brandeburg, Jesse" , "Ronciak, John" , linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" List-Id: linux-api@vger.kernel.org On Wed, Nov 25, 2015 at 8:02 AM, Lan, Tianyu wrote: > On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote: >> >> Frankly, I don't really see what this short term hack buys us, >> and if it goes in, we'll have to maintain it forever. >> > > The framework of how to notify VF about migration status won't be > changed regardless of stopping VF or not before doing migration. > We hope to reach agreement on this first. Tracking dirty memory still > need to more discussions and we will continue working on it. Stop VF may > help to work around the issue and make tracking easier. The problem is you still have to stop the device at some point for the same reason why you have to halt the VM. You seem to think you can get by without doing that but you can't. All you do is open the system up to multiple races if you leave the device running. The goal should be to avoid stopping the device until the last possible moment, however it will still have to be stopped eventually. It isn't as if you can migrate memory and leave the device doing DMA and expect to get a clean state. I agree with Michael. The focus needs to be on first addressing dirty page tracking. Once you have that you could use a variation on the bonding solution where you postpone the hot-plug event until near the end of the migration just before you halt the guest instead of having to do it before you start the migration. Then after that we could look at optimizing things further by introducing a variation that you could further improve on things by introducing a variation of hot-plug that would pause the device as I suggested instead of removing it. At that point you should be able to have almost all of the key issues addresses so that you could drop the bond interface entirely. >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> easy enough to do using a guest agent, in a completely generic way. >> > > Just ifdown/ifup is not enough for migration. It needs to restore some PCI > settings before doing ifup on the target machine That is why I have been suggesting making use of suspend/resume logic that is already in place for PCI power management. In the case of a suspend/resume we already have to deal with the fact that the device will go through a D0->D3->D0 reset so we have to restore all of the existing state. It would take a significant load off of Qemu since the guest would be restoring its own state instead of making Qemu have to do all of the device migration work. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Date: Wed, 25 Nov 2015 08:24:38 -0800 Subject: [Intel-wired-lan] [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver In-Reply-To: <5655DB99.3040007@intel.com> References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: On Wed, Nov 25, 2015 at 8:02 AM, Lan, Tianyu wrote: > On 11/25/2015 8:28 PM, Michael S. Tsirkin wrote: >> >> Frankly, I don't really see what this short term hack buys us, >> and if it goes in, we'll have to maintain it forever. >> > > The framework of how to notify VF about migration status won't be > changed regardless of stopping VF or not before doing migration. > We hope to reach agreement on this first. Tracking dirty memory still > need to more discussions and we will continue working on it. Stop VF may > help to work around the issue and make tracking easier. The problem is you still have to stop the device at some point for the same reason why you have to halt the VM. You seem to think you can get by without doing that but you can't. All you do is open the system up to multiple races if you leave the device running. The goal should be to avoid stopping the device until the last possible moment, however it will still have to be stopped eventually. It isn't as if you can migrate memory and leave the device doing DMA and expect to get a clean state. I agree with Michael. The focus needs to be on first addressing dirty page tracking. Once you have that you could use a variation on the bonding solution where you postpone the hot-plug event until near the end of the migration just before you halt the guest instead of having to do it before you start the migration. Then after that we could look at optimizing things further by introducing a variation that you could further improve on things by introducing a variation of hot-plug that would pause the device as I suggested instead of removing it. At that point you should be able to have almost all of the key issues addresses so that you could drop the bond interface entirely. >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> easy enough to do using a guest agent, in a completely generic way. >> > > Just ifdown/ifup is not enough for migration. It needs to restore some PCI > settings before doing ifup on the target machine That is why I have been suggesting making use of suspend/resume logic that is already in place for PCI power management. In the case of a suspend/resume we already have to deal with the fact that the device will go through a D0->D3->D0 reset so we have to restore all of the existing state. It would take a significant load off of Qemu since the guest would be restoring its own state instead of making Qemu have to do all of the device migration work.