From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753194AbbK3GyM (ORCPT ); Mon, 30 Nov 2015 01:54:12 -0500 Received: from mga03.intel.com ([134.134.136.65]:15500 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750917AbbK3GyI (ORCPT ); Mon, 30 Nov 2015 01:54:08 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,364,1444719600"; d="scan'208";a="696494696" Subject: Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC To: Alexander Duyck , "Dong, Eddie" References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <5654722D.4010409@gmail.com> <56552888.90108@intel.com> <56556F98.5060507@intel.com> Cc: "a.motakis@virtualopensystems.com" , Alex Williamson , "b.reynal@virtualopensystems.com" , Bjorn Helgaas , "Wyborny, Carolyn" , "Skidmore, Donald C" , "Jani, Nrupal" , Alexander Graf , "kvm@vger.kernel.org" , Paolo Bonzini , "qemu-devel@nongnu.org" , "Tantilov, Emil S" , Or Gerlitz , "Rustad, Mark D" , "Michael S. Tsirkin" , Eric Auger , intel-wired-lan , "Kirsher, Jeffrey T" , "Brandeburg, Jesse" , "Ronciak, John" , "linux-api@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Williams, Mitch A" , Netdev , "Nelson, Shannon" , Wei Yang , "zajec5@gmail.com" From: "Lan, Tianyu" Message-ID: <565BF285.4040507@intel.com> Date: Mon, 30 Nov 2015 14:53:57 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/26/2015 11:56 AM, Alexander Duyck wrote: > > I am not saying you cannot modify the drivers, however what you are > doing is far too invasive. Do you seriously plan on modifying all of > the PCI device drivers out there in order to allow any device that > might be direct assigned to a port to support migration? I certainly > hope not. That is why I have said that this solution will not scale. Current drivers are not migration friendly. If the driver wants to support migration, it's necessary to be changed. RFC PATCH V1 presented our ideas about how to deal with MMIO, ring and DMA tracking during migration. These are common for most drivers and they maybe problematic in the previous version but can be corrected later. Doing suspend and resume() may help to do migration easily but some devices requires low service down time. Especially network and I got that some cloud company promised less than 500ms network service downtime. So I think performance effect also should be taken into account when we design the framework. > > What I am counter proposing seems like a very simple proposition. It > can be implemented in two steps. > > 1. Look at modifying dma_mark_clean(). It is a function called in > the sync and unmap paths of the lib/swiotlb.c. If you could somehow > modify it to take care of marking the pages you unmap for Rx as being > dirty it will get you a good way towards your goal as it will allow > you to continue to do DMA while you are migrating the VM. > > 2. Look at making use of the existing PCI suspend/resume calls that > are there to support PCI power management. They have everything > needed to allow you to pause and resume DMA for the device before and > after the migration while retaining the driver state. If you can > implement something that allows you to trigger these calls from the > PCI subsystem such as hot-plug then you would have a generic solution > that can be easily reproduced for multiple drivers beyond those > supported by ixgbevf. Glanced at PCI hotplug code. The hotplug events are triggered by PCI hotplug controller and these event are defined in the controller spec. It's hard to extend more events. Otherwise, we also need to add some specific codes in the PCI hotplug core since it's only add and remove PCI device when it gets events. It's also a challenge to modify Windows hotplug codes. So we may need to find another way. > > Thanks. > > - Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35201) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3ILl-0003mk-Sm for qemu-devel@nongnu.org; Mon, 30 Nov 2015 01:54:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a3ILg-0006mA-SC for qemu-devel@nongnu.org; Mon, 30 Nov 2015 01:54:13 -0500 Received: from mga01.intel.com ([192.55.52.88]:51074) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3ILg-0006le-JP for qemu-devel@nongnu.org; Mon, 30 Nov 2015 01:54:08 -0500 References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <5654722D.4010409@gmail.com> <56552888.90108@intel.com> <56556F98.5060507@intel.com> From: "Lan, Tianyu" Message-ID: <565BF285.4040507@intel.com> Date: Mon, 30 Nov 2015 14:53:57 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexander Duyck , "Dong, Eddie" Cc: Wei Yang , "Tantilov, Emil S" , "kvm@vger.kernel.org" , "Michael S. Tsirkin" , "qemu-devel@nongnu.org" , "Brandeburg, Jesse" , "Rustad, Mark D" , "Wyborny, Carolyn" , Eric Auger , "Skidmore, Donald C" , "zajec5@gmail.com" , Alexander Graf , intel-wired-lan , "Kirsher, Jeffrey T" , Or Gerlitz , "Williams, Mitch A" , "Jani, Nrupal" , Bjorn Helgaas , "a.motakis@virtualopensystems.com" , "b.reynal@virtualopensystems.com" , "linux-api@vger.kernel.org" , "Nelson, Shannon" , Alex Williamson , "linux-kernel@vger.kernel.org" , "Ronciak, John" , Netdev , Paolo Bonzini On 11/26/2015 11:56 AM, Alexander Duyck wrote: > > I am not saying you cannot modify the drivers, however what you are > doing is far too invasive. Do you seriously plan on modifying all of > the PCI device drivers out there in order to allow any device that > might be direct assigned to a port to support migration? I certainly > hope not. That is why I have said that this solution will not scale. Current drivers are not migration friendly. If the driver wants to support migration, it's necessary to be changed. RFC PATCH V1 presented our ideas about how to deal with MMIO, ring and DMA tracking during migration. These are common for most drivers and they maybe problematic in the previous version but can be corrected later. Doing suspend and resume() may help to do migration easily but some devices requires low service down time. Especially network and I got that some cloud company promised less than 500ms network service downtime. So I think performance effect also should be taken into account when we design the framework. > > What I am counter proposing seems like a very simple proposition. It > can be implemented in two steps. > > 1. Look at modifying dma_mark_clean(). It is a function called in > the sync and unmap paths of the lib/swiotlb.c. If you could somehow > modify it to take care of marking the pages you unmap for Rx as being > dirty it will get you a good way towards your goal as it will allow > you to continue to do DMA while you are migrating the VM. > > 2. Look at making use of the existing PCI suspend/resume calls that > are there to support PCI power management. They have everything > needed to allow you to pause and resume DMA for the device before and > after the migration while retaining the driver state. If you can > implement something that allows you to trigger these calls from the > PCI subsystem such as hot-plug then you would have a generic solution > that can be easily reproduced for multiple drivers beyond those > supported by ixgbevf. Glanced at PCI hotplug code. The hotplug events are triggered by PCI hotplug controller and these event are defined in the controller spec. It's hard to extend more events. Otherwise, we also need to add some specific codes in the PCI hotplug core since it's only add and remove PCI device when it gets events. It's also a challenge to modify Windows hotplug codes. So we may need to find another way. > > Thanks. > > - Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Lan, Tianyu" Subject: Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC Date: Mon, 30 Nov 2015 14:53:57 +0800 Message-ID: <565BF285.4040507@intel.com> References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <5654722D.4010409@gmail.com> <56552888.90108@intel.com> <56556F98.5060507@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org" , Alex Williamson , "b.reynal-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org" , Bjorn Helgaas , "Wyborny, Carolyn" , "Skidmore, Donald C" , "Jani, Nrupal" , Alexander Graf , "kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Paolo Bonzini , "qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org" , "Tantilov, Emil S" , Or Gerlitz , "Rustad, Mark D" , "Michael S. Tsirkin" , Eric Auger , intel-wired-lan , "Kirsher, Jeffrey T" , "Brandeburg, Jesse" , "Dong, Eddie" Return-path: In-Reply-To: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: netdev.vger.kernel.org On 11/26/2015 11:56 AM, Alexander Duyck wrote: > > I am not saying you cannot modify the drivers, however what you are > doing is far too invasive. Do you seriously plan on modifying all of > the PCI device drivers out there in order to allow any device that > might be direct assigned to a port to support migration? I certainly > hope not. That is why I have said that this solution will not scale. Current drivers are not migration friendly. If the driver wants to support migration, it's necessary to be changed. RFC PATCH V1 presented our ideas about how to deal with MMIO, ring and DMA tracking during migration. These are common for most drivers and they maybe problematic in the previous version but can be corrected later. Doing suspend and resume() may help to do migration easily but some devices requires low service down time. Especially network and I got that some cloud company promised less than 500ms network service downtime. So I think performance effect also should be taken into account when we design the framework. > > What I am counter proposing seems like a very simple proposition. It > can be implemented in two steps. > > 1. Look at modifying dma_mark_clean(). It is a function called in > the sync and unmap paths of the lib/swiotlb.c. If you could somehow > modify it to take care of marking the pages you unmap for Rx as being > dirty it will get you a good way towards your goal as it will allow > you to continue to do DMA while you are migrating the VM. > > 2. Look at making use of the existing PCI suspend/resume calls that > are there to support PCI power management. They have everything > needed to allow you to pause and resume DMA for the device before and > after the migration while retaining the driver state. If you can > implement something that allows you to trigger these calls from the > PCI subsystem such as hot-plug then you would have a generic solution > that can be easily reproduced for multiple drivers beyond those > supported by ixgbevf. Glanced at PCI hotplug code. The hotplug events are triggered by PCI hotplug controller and these event are defined in the controller spec. It's hard to extend more events. Otherwise, we also need to add some specific codes in the PCI hotplug core since it's only add and remove PCI device when it gets events. It's also a challenge to modify Windows hotplug codes. So we may need to find another way. > > Thanks. > > - Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Lan, Tianyu" Subject: Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC Date: Mon, 30 Nov 2015 14:53:57 +0800 Message-ID: <565BF285.4040507@intel.com> References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <5654722D.4010409@gmail.com> <56552888.90108@intel.com> <56556F98.5060507@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Alexander Duyck , "Dong, Eddie" Cc: "a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org" , Alex Williamson , "b.reynal-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org" , Bjorn Helgaas , "Wyborny, Carolyn" , "Skidmore, Donald C" , "Jani, Nrupal" , Alexander Graf , "kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Paolo Bonzini , "qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org" , "Tantilov, Emil S" , Or Gerlitz , "Rustad, Mark D" , "Michael S. Tsirkin" , Eric Auger , intel-wired-lan , "Kirsher, Jeffrey T" , "Brandeburg, Jesse" List-Id: linux-api@vger.kernel.org On 11/26/2015 11:56 AM, Alexander Duyck wrote: > > I am not saying you cannot modify the drivers, however what you are > doing is far too invasive. Do you seriously plan on modifying all of > the PCI device drivers out there in order to allow any device that > might be direct assigned to a port to support migration? I certainly > hope not. That is why I have said that this solution will not scale. Current drivers are not migration friendly. If the driver wants to support migration, it's necessary to be changed. RFC PATCH V1 presented our ideas about how to deal with MMIO, ring and DMA tracking during migration. These are common for most drivers and they maybe problematic in the previous version but can be corrected later. Doing suspend and resume() may help to do migration easily but some devices requires low service down time. Especially network and I got that some cloud company promised less than 500ms network service downtime. So I think performance effect also should be taken into account when we design the framework. > > What I am counter proposing seems like a very simple proposition. It > can be implemented in two steps. > > 1. Look at modifying dma_mark_clean(). It is a function called in > the sync and unmap paths of the lib/swiotlb.c. If you could somehow > modify it to take care of marking the pages you unmap for Rx as being > dirty it will get you a good way towards your goal as it will allow > you to continue to do DMA while you are migrating the VM. > > 2. Look at making use of the existing PCI suspend/resume calls that > are there to support PCI power management. They have everything > needed to allow you to pause and resume DMA for the device before and > after the migration while retaining the driver state. If you can > implement something that allows you to trigger these calls from the > PCI subsystem such as hot-plug then you would have a generic solution > that can be easily reproduced for multiple drivers beyond those > supported by ixgbevf. Glanced at PCI hotplug code. The hotplug events are triggered by PCI hotplug controller and these event are defined in the controller spec. It's hard to extend more events. Otherwise, we also need to add some specific codes in the PCI hotplug core since it's only add and remove PCI device when it gets events. It's also a challenge to modify Windows hotplug codes. So we may need to find another way. > > Thanks. > > - Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lan, Tianyu Date: Mon, 30 Nov 2015 14:53:57 +0800 Subject: [Intel-wired-lan] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC In-Reply-To: References: <1448372298-28386-1-git-send-email-tianyu.lan@intel.com> <5654722D.4010409@gmail.com> <56552888.90108@intel.com> <56556F98.5060507@intel.com> Message-ID: <565BF285.4040507@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: On 11/26/2015 11:56 AM, Alexander Duyck wrote: > > I am not saying you cannot modify the drivers, however what you are > doing is far too invasive. Do you seriously plan on modifying all of > the PCI device drivers out there in order to allow any device that > might be direct assigned to a port to support migration? I certainly > hope not. That is why I have said that this solution will not scale. Current drivers are not migration friendly. If the driver wants to support migration, it's necessary to be changed. RFC PATCH V1 presented our ideas about how to deal with MMIO, ring and DMA tracking during migration. These are common for most drivers and they maybe problematic in the previous version but can be corrected later. Doing suspend and resume() may help to do migration easily but some devices requires low service down time. Especially network and I got that some cloud company promised less than 500ms network service downtime. So I think performance effect also should be taken into account when we design the framework. > > What I am counter proposing seems like a very simple proposition. It > can be implemented in two steps. > > 1. Look at modifying dma_mark_clean(). It is a function called in > the sync and unmap paths of the lib/swiotlb.c. If you could somehow > modify it to take care of marking the pages you unmap for Rx as being > dirty it will get you a good way towards your goal as it will allow > you to continue to do DMA while you are migrating the VM. > > 2. Look at making use of the existing PCI suspend/resume calls that > are there to support PCI power management. They have everything > needed to allow you to pause and resume DMA for the device before and > after the migration while retaining the driver state. If you can > implement something that allows you to trigger these calls from the > PCI subsystem such as hot-plug then you would have a generic solution > that can be easily reproduced for multiple drivers beyond those > supported by ixgbevf. Glanced at PCI hotplug code. The hotplug events are triggered by PCI hotplug controller and these event are defined in the controller spec. It's hard to extend more events. Otherwise, we also need to add some specific codes in the PCI hotplug core since it's only add and remove PCI device when it gets events. It's also a challenge to modify Windows hotplug codes. So we may need to find another way. > > Thanks. > > - Alex