From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751327AbbKYRZE (ORCPT ); Wed, 25 Nov 2015 12:25:04 -0500 Received: from mail-io0-f193.google.com ([209.85.223.193]:35641 "EHLO mail-io0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750726AbbKYRYw (ORCPT ); Wed, 25 Nov 2015 12:24:52 -0500 MIME-Version: 1.0 In-Reply-To: <20151125183435-mutt-send-email-mst@redhat.com> References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> <20151125183435-mutt-send-email-mst@redhat.com> Date: Wed, 25 Nov 2015 09:24:51 -0800 Message-ID: Subject: Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver From: Alexander Duyck To: "Michael S. Tsirkin" Cc: "Lan, Tianyu" , a.motakis@virtualopensystems.com, Alex Williamson , b.reynal@virtualopensystems.com, Bjorn Helgaas , Carolyn Wyborny , "Skidmore, Donald C" , eddie.dong@intel.com, nrupal.jani@intel.com, Alexander Graf , kvm@vger.kernel.org, Paolo Bonzini , qemu-devel@nongnu.org, "Tantilov, Emil S" , Or Gerlitz , "Rustad, Mark D" , Eric Auger , intel-wired-lan , Jeff Kirsher , "Brandeburg, Jesse" , "Ronciak, John" , linux-api@vger.kernel.org, "linux-kernel@vger.kernel.org" , Mitch Williams , Netdev , "Nelson, Shannon" , Wei Yang , zajec5@gmail.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 25, 2015 at 8:39 AM, Michael S. Tsirkin wrote: > On Wed, Nov 25, 2015 at 08:24:38AM -0800, Alexander Duyck wrote: >> >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> >> easy enough to do using a guest agent, in a completely generic way. >> >> >> > >> > Just ifdown/ifup is not enough for migration. It needs to restore some PCI >> > settings before doing ifup on the target machine >> >> That is why I have been suggesting making use of suspend/resume logic >> that is already in place for PCI power management. In the case of a >> suspend/resume we already have to deal with the fact that the device >> will go through a D0->D3->D0 reset so we have to restore all of the >> existing state. It would take a significant load off of Qemu since >> the guest would be restoring its own state instead of making Qemu have >> to do all of the device migration work. > > That can work, though again, the issue is you need guest > cooperation to migrate. Right now the problem is you need to have guest cooperation anyway as you need to have some way of tracking the dirty pages. If the IOMMU on the host were to provide some sort of dirty page tracking then we could exclude the guest from the equation, but until then we need the guest to notify us of what pages it is letting the device dirty. I'm still of the opinion that the best way to go there is to just modify the DMA API that is used in the guest so that it supports some sort of page flag modification or something along those lines so we can track all of the pages that might be written to by the device. > If you reset device on destination instead of restoring state, > then that issue goes away, but maybe the downtime > will be increased. Yes, the downtime will be increased, but it shouldn't be by much. Depending on the setup a VF with a single queue can have about 3MB of data outstanding when you move the driver over. After that it is just a matter of bringing the interface back up which should take only a few hundred milliseconds assuming the PF is fairly responsive. > Will it really? I think it's worth it to start with the > simplest solution (reset on destination) and see > what the effect is, then add optimizations. Agreed. My thought would be to start with something like dma_mark_clean() that could be used to take care of marking the pages for migration when they are unmapped or synced. > One thing that I've been thinking about for a while, is saving (some) > state speculatively. For example, notify guest a bit before migration > is done, so it can save device state. If guest responds quickly, you > have state that can be restored. If it doesn't, still migrate, and it > will have to reset on destination. I'm not sure how much more device state we really need to save. The driver in the guest has to have enough state to recover in the event of a device failure resulting in a slot reset. To top it off the driver is able to reconfigure things probably as quick as we could if we were restoring the state. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48049) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1doL-0004Q6-9q for qemu-devel@nongnu.org; Wed, 25 Nov 2015 12:24:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a1doK-0000a9-8n for qemu-devel@nongnu.org; Wed, 25 Nov 2015 12:24:53 -0500 Received: from mail-io0-x244.google.com ([2607:f8b0:4001:c06::244]:33081) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a1doJ-0000a5-Sh for qemu-devel@nongnu.org; Wed, 25 Nov 2015 12:24:52 -0500 Received: by ioef137 with SMTP id f137so5442415ioe.0 for ; Wed, 25 Nov 2015 09:24:51 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20151125183435-mutt-send-email-mst@redhat.com> References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> <20151125183435-mutt-send-email-mst@redhat.com> Date: Wed, 25 Nov 2015 09:24:51 -0800 Message-ID: From: Alexander Duyck Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Wei Yang , "Tantilov, Emil S" , kvm@vger.kernel.org, qemu-devel@nongnu.org, "Brandeburg, Jesse" , "Rustad, Mark D" , Carolyn Wyborny , Eric Auger , "Skidmore, Donald C" , zajec5@gmail.com, Alexander Graf , intel-wired-lan , Jeff Kirsher , Or Gerlitz , Mitch Williams , nrupal.jani@intel.com, Bjorn Helgaas , a.motakis@virtualopensystems.com, "Lan, Tianyu" , b.reynal@virtualopensystems.com, linux-api@vger.kernel.org, "Nelson, Shannon" , eddie.dong@intel.com, Alex Williamson , "linux-kernel@vger.kernel.org" , "Ronciak, John" , Netdev , Paolo Bonzini On Wed, Nov 25, 2015 at 8:39 AM, Michael S. Tsirkin wrote: > On Wed, Nov 25, 2015 at 08:24:38AM -0800, Alexander Duyck wrote: >> >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> >> easy enough to do using a guest agent, in a completely generic way. >> >> >> > >> > Just ifdown/ifup is not enough for migration. It needs to restore some PCI >> > settings before doing ifup on the target machine >> >> That is why I have been suggesting making use of suspend/resume logic >> that is already in place for PCI power management. In the case of a >> suspend/resume we already have to deal with the fact that the device >> will go through a D0->D3->D0 reset so we have to restore all of the >> existing state. It would take a significant load off of Qemu since >> the guest would be restoring its own state instead of making Qemu have >> to do all of the device migration work. > > That can work, though again, the issue is you need guest > cooperation to migrate. Right now the problem is you need to have guest cooperation anyway as you need to have some way of tracking the dirty pages. If the IOMMU on the host were to provide some sort of dirty page tracking then we could exclude the guest from the equation, but until then we need the guest to notify us of what pages it is letting the device dirty. I'm still of the opinion that the best way to go there is to just modify the DMA API that is used in the guest so that it supports some sort of page flag modification or something along those lines so we can track all of the pages that might be written to by the device. > If you reset device on destination instead of restoring state, > then that issue goes away, but maybe the downtime > will be increased. Yes, the downtime will be increased, but it shouldn't be by much. Depending on the setup a VF with a single queue can have about 3MB of data outstanding when you move the driver over. After that it is just a matter of bringing the interface back up which should take only a few hundred milliseconds assuming the PF is fairly responsive. > Will it really? I think it's worth it to start with the > simplest solution (reset on destination) and see > what the effect is, then add optimizations. Agreed. My thought would be to start with something like dma_mark_clean() that could be used to take care of marking the pages for migration when they are unmapped or synced. > One thing that I've been thinking about for a while, is saving (some) > state speculatively. For example, notify guest a bit before migration > is done, so it can save device state. If guest responds quickly, you > have state that can be restored. If it doesn't, still migrate, and it > will have to reset on destination. I'm not sure how much more device state we really need to save. The driver in the guest has to have enough state to recover in the event of a device failure resulting in a slot reset. To top it off the driver is able to reconfigure things probably as quick as we could if we were restoring the state. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver Date: Wed, 25 Nov 2015 09:24:51 -0800 Message-ID: References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> <20151125183435-mutt-send-email-mst@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Wei Yang , "Tantilov, Emil S" , kvm@vger.kernel.org, qemu-devel@nongnu.org, "Brandeburg, Jesse" , "Rustad, Mark D" , Carolyn Wyborny , Eric Auger , "Skidmore, Donald C" , zajec5@gmail.com, Alexander Graf , intel-wired-lan , Jeff Kirsher , Or Gerlitz , Mitch Williams , nrupal.jani@intel.com, Bjorn Helgaas , a.motakis@virtualopensystems.com, "Lan, Tianyu" , b.reynal@virtualopensystems.com, linux-api@vger.kernel.org, "Nelson, Shannon" , eddie.dong@intel.com, Alex Williamson Return-path: In-Reply-To: <20151125183435-mutt-send-email-mst@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: netdev.vger.kernel.org On Wed, Nov 25, 2015 at 8:39 AM, Michael S. Tsirkin wrote: > On Wed, Nov 25, 2015 at 08:24:38AM -0800, Alexander Duyck wrote: >> >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> >> easy enough to do using a guest agent, in a completely generic way. >> >> >> > >> > Just ifdown/ifup is not enough for migration. It needs to restore some PCI >> > settings before doing ifup on the target machine >> >> That is why I have been suggesting making use of suspend/resume logic >> that is already in place for PCI power management. In the case of a >> suspend/resume we already have to deal with the fact that the device >> will go through a D0->D3->D0 reset so we have to restore all of the >> existing state. It would take a significant load off of Qemu since >> the guest would be restoring its own state instead of making Qemu have >> to do all of the device migration work. > > That can work, though again, the issue is you need guest > cooperation to migrate. Right now the problem is you need to have guest cooperation anyway as you need to have some way of tracking the dirty pages. If the IOMMU on the host were to provide some sort of dirty page tracking then we could exclude the guest from the equation, but until then we need the guest to notify us of what pages it is letting the device dirty. I'm still of the opinion that the best way to go there is to just modify the DMA API that is used in the guest so that it supports some sort of page flag modification or something along those lines so we can track all of the pages that might be written to by the device. > If you reset device on destination instead of restoring state, > then that issue goes away, but maybe the downtime > will be increased. Yes, the downtime will be increased, but it shouldn't be by much. Depending on the setup a VF with a single queue can have about 3MB of data outstanding when you move the driver over. After that it is just a matter of bringing the interface back up which should take only a few hundred milliseconds assuming the PF is fairly responsive. > Will it really? I think it's worth it to start with the > simplest solution (reset on destination) and see > what the effect is, then add optimizations. Agreed. My thought would be to start with something like dma_mark_clean() that could be used to take care of marking the pages for migration when they are unmapped or synced. > One thing that I've been thinking about for a while, is saving (some) > state speculatively. For example, notify guest a bit before migration > is done, so it can save device state. If guest responds quickly, you > have state that can be restored. If it doesn't, still migrate, and it > will have to reset on destination. I'm not sure how much more device state we really need to save. The driver in the guest has to have enough state to recover in the event of a device failure resulting in a slot reset. To top it off the driver is able to reconfigure things probably as quick as we could if we were restoring the state. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver Date: Wed, 25 Nov 2015 09:24:51 -0800 Message-ID: References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> <20151125183435-mutt-send-email-mst@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: In-Reply-To: <20151125183435-mutt-send-email-mst@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org To: "Michael S. Tsirkin" Cc: Wei Yang , "Tantilov, Emil S" , kvm@vger.kernel.org, qemu-devel@nongnu.org, "Brandeburg, Jesse" , "Rustad, Mark D" , Carolyn Wyborny , Eric Auger , "Skidmore, Donald C" , zajec5@gmail.com, Alexander Graf , intel-wired-lan , Jeff Kirsher , Or Gerlitz , Mitch Williams , nrupal.jani@intel.com, Bjorn Helgaas , a.motakis@virtualopensystems.com, "Lan, Tianyu" , b.reynal@virtualopensystems.com, linux-api@vger.kernel.org, "Nelson, Shannon" , eddie.dong@intel.com, Alex Williamson List-Id: linux-api@vger.kernel.org On Wed, Nov 25, 2015 at 8:39 AM, Michael S. Tsirkin wrote: > On Wed, Nov 25, 2015 at 08:24:38AM -0800, Alexander Duyck wrote: >> >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> >> easy enough to do using a guest agent, in a completely generic way. >> >> >> > >> > Just ifdown/ifup is not enough for migration. It needs to restore some PCI >> > settings before doing ifup on the target machine >> >> That is why I have been suggesting making use of suspend/resume logic >> that is already in place for PCI power management. In the case of a >> suspend/resume we already have to deal with the fact that the device >> will go through a D0->D3->D0 reset so we have to restore all of the >> existing state. It would take a significant load off of Qemu since >> the guest would be restoring its own state instead of making Qemu have >> to do all of the device migration work. > > That can work, though again, the issue is you need guest > cooperation to migrate. Right now the problem is you need to have guest cooperation anyway as you need to have some way of tracking the dirty pages. If the IOMMU on the host were to provide some sort of dirty page tracking then we could exclude the guest from the equation, but until then we need the guest to notify us of what pages it is letting the device dirty. I'm still of the opinion that the best way to go there is to just modify the DMA API that is used in the guest so that it supports some sort of page flag modification or something along those lines so we can track all of the pages that might be written to by the device. > If you reset device on destination instead of restoring state, > then that issue goes away, but maybe the downtime > will be increased. Yes, the downtime will be increased, but it shouldn't be by much. Depending on the setup a VF with a single queue can have about 3MB of data outstanding when you move the driver over. After that it is just a matter of bringing the interface back up which should take only a few hundred milliseconds assuming the PF is fairly responsive. > Will it really? I think it's worth it to start with the > simplest solution (reset on destination) and see > what the effect is, then add optimizations. Agreed. My thought would be to start with something like dma_mark_clean() that could be used to take care of marking the pages for migration when they are unmapped or synced. > One thing that I've been thinking about for a while, is saving (some) > state speculatively. For example, notify guest a bit before migration > is done, so it can save device state. If guest responds quickly, you > have state that can be restored. If it doesn't, still migrate, and it > will have to reset on destination. I'm not sure how much more device state we really need to save. The driver in the guest has to have enough state to recover in the event of a device failure resulting in a slot reset. To top it off the driver is able to reconfigure things probably as quick as we could if we were restoring the state. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Date: Wed, 25 Nov 2015 09:24:51 -0800 Subject: [Intel-wired-lan] [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver In-Reply-To: <20151125183435-mutt-send-email-mst@redhat.com> References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <1448372298-28386-4-git-send-email-tianyu.lan@intel.com> <20151124230551-mutt-send-email-mst@redhat.com> <56554994.1090305@intel.com> <20151125142437-mutt-send-email-mst@redhat.com> <5655DB99.3040007@intel.com> <20151125183435-mutt-send-email-mst@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: On Wed, Nov 25, 2015 at 8:39 AM, Michael S. Tsirkin wrote: > On Wed, Nov 25, 2015 at 08:24:38AM -0800, Alexander Duyck wrote: >> >> Also, assuming you just want to do ifdown/ifup for some reason, it's >> >> easy enough to do using a guest agent, in a completely generic way. >> >> >> > >> > Just ifdown/ifup is not enough for migration. It needs to restore some PCI >> > settings before doing ifup on the target machine >> >> That is why I have been suggesting making use of suspend/resume logic >> that is already in place for PCI power management. In the case of a >> suspend/resume we already have to deal with the fact that the device >> will go through a D0->D3->D0 reset so we have to restore all of the >> existing state. It would take a significant load off of Qemu since >> the guest would be restoring its own state instead of making Qemu have >> to do all of the device migration work. > > That can work, though again, the issue is you need guest > cooperation to migrate. Right now the problem is you need to have guest cooperation anyway as you need to have some way of tracking the dirty pages. If the IOMMU on the host were to provide some sort of dirty page tracking then we could exclude the guest from the equation, but until then we need the guest to notify us of what pages it is letting the device dirty. I'm still of the opinion that the best way to go there is to just modify the DMA API that is used in the guest so that it supports some sort of page flag modification or something along those lines so we can track all of the pages that might be written to by the device. > If you reset device on destination instead of restoring state, > then that issue goes away, but maybe the downtime > will be increased. Yes, the downtime will be increased, but it shouldn't be by much. Depending on the setup a VF with a single queue can have about 3MB of data outstanding when you move the driver over. After that it is just a matter of bringing the interface back up which should take only a few hundred milliseconds assuming the PF is fairly responsive. > Will it really? I think it's worth it to start with the > simplest solution (reset on destination) and see > what the effect is, then add optimizations. Agreed. My thought would be to start with something like dma_mark_clean() that could be used to take care of marking the pages for migration when they are unmapped or synced. > One thing that I've been thinking about for a while, is saving (some) > state speculatively. For example, notify guest a bit before migration > is done, so it can save device state. If guest responds quickly, you > have state that can be restored. If it doesn't, still migrate, and it > will have to reset on destination. I'm not sure how much more device state we really need to save. The driver in the guest has to have enough state to recover in the event of a device failure resulting in a slot reset. To top it off the driver is able to reconfigure things probably as quick as we could if we were restoring the state.