All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: "Lan, Tianyu" <tianyu.lan@intel.com>,
	"Dong, Eddie" <eddie.dong@intel.com>,
	"a.motakis@virtualopensystems.com"
	<a.motakis@virtualopensystems.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"b.reynal@virtualopensystems.com"
	<b.reynal@virtualopensystems.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Wyborny, Carolyn" <carolyn.wyborny@intel.com>,
	"Skidmore, Donald C" <donald.c.skidmore@intel.com>,
	"Jani, Nrupal" <nrupal.jani@intel.com>,
	Alexander Graf <agraf@suse.de>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Tantilov, Emil S" <emil.s.tantilov@intel.com>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	"Rustad, Mark D" <mark.d.rustad@intel.com>,
	Eric Auger <eric.auger@linaro.org>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
	"Ronciak, John" <john.ronciak@intel.com>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Williams, Mitch A" <mitch.a.williams@intel.com>,
	Netdev <netdev@vger.kernel.org>,
	"Nelson, Shannon" <shannon.nelson@intel.com>,
	Wei Yang <weiyang@linux.vnet.ibm.com>,
	"zajec5@gmail.com" <zajec5@gmail.com>
Subject: Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC
Date: Tue, 1 Dec 2015 19:37:51 +0200	[thread overview]
Message-ID: <20151201193026-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <CAKgT0UfLEJpV-KdqRGfzBeas8bdqfHCmT5Xc8iVVP03g_pQO8A@mail.gmail.com>

On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:
> On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Dec 01, 2015 at 11:04:31PM +0800, Lan, Tianyu wrote:
> >>
> >>
> >> On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> >> >They can only be corrected if the underlying assumptions are correct
> >> >and they aren't.  Your solution would have never worked correctly.
> >> >The problem is you assume you can keep the device running when you are
> >> >migrating and you simply cannot.  At some point you will always have
> >> >to stop the device in order to complete the migration, and you cannot
> >> >stop it before you have stopped your page tracking mechanism.  So
> >> >unless the platform has an IOMMU that is somehow taking part in the
> >> >dirty page tracking you will not be able to stop the guest and then
> >> >the device, it will have to be the device and then the guest.
> >> >
> >> >>>Doing suspend and resume() may help to do migration easily but some
> >> >>>devices requires low service down time. Especially network and I got
> >> >>>that some cloud company promised less than 500ms network service downtime.
> >> >Honestly focusing on the downtime is getting the cart ahead of the
> >> >horse.  First you need to be able to do this without corrupting system
> >> >memory and regardless of the state of the device.  You haven't even
> >> >gotten to that state yet.  Last I knew the device had to be up in
> >> >order for your migration to even work.
> >>
> >> I think the issue is that the content of rx package delivered to stack maybe
> >> changed during migration because the piece of memory won't be migrated to
> >> new machine. This may confuse applications or stack. Current dummy write
> >> solution can ensure the content of package won't change after doing dummy
> >> write while the content maybe not received data if migration happens before
> >> that point. We can recheck the content via checksum or crc in the protocol
> >> after dummy write to ensure the content is what VF received. I think stack
> >> has already done such checks and the package will be abandoned if failed to
> >> pass through the check.
> >
> >
> > Most people nowdays rely on hardware checksums so I don't think this can
> > fly.
> 
> Correct.  The checksum/crc approach will not work since it is possible
> for a checksum to even be mangled in the case of some features such as
> LRO or GRO.
> 
> >> Another way is to tell all memory driver are using to Qemu and let Qemu to
> >> migrate these memory after stopping VCPU and the device. This seems safe but
> >> implementation maybe complex.
> >
> > Not really 100% safe.  See below.
> >
> > I think hiding these details behind dma_* API does have
> > some appeal. In any case, it gives us a good
> > terminology as it covers what most drivers do.
> 
> That was kind of my thought.  If we were to build our own
> dma_mark_clean() type function that will mark the DMA region dirty on
> sync or unmap then that is half the battle right there as we would be
> able to at least keep the regions consistent after they have left the
> driver.
> 
> > There are several components to this:
> > - dma_map_* needs to prevent page from
> >   being migrated while device is running.
> >   For example, expose some kind of bitmap from guest
> >   to host, set bit there while page is mapped.
> >   What happens if we stop the guest and some
> >   bits are still set? See dma_alloc_coherent below
> >   for some ideas.
> 
> Yeah, I could see something like this working.  Maybe we could do
> something like what was done for the NX bit and make use of the upper
> order bits beyond the limits of the memory range to mark pages as
> non-migratable?
> 
> I'm curious.  What we have with a DMA mapped region is essentially
> shared memory between the guest and the device.  How would we resolve
> something like this with IVSHMEM, or are we blocked there as well in
> terms of migration?

I have some ideas. Will post later.

> > - dma_unmap_* needs to mark page as dirty
> >   This can be done by writing into a page.
> >
> > - dma_sync_* needs to mark page as dirty
> >   This is trickier as we can not change the data.
> >   One solution is using atomics.
> >   For example:
> >         int x = ACCESS_ONCE(*p);
> >         cmpxchg(p, x, x);
> >   Seems to do a write without changing page
> >   contents.
> 
> Like I said we can probably kill 2 birds with one stone by just
> implementing our own dma_mark_clean() for x86 virtualized
> environments.
> 
> I'd say we could take your solution one step further and just use 0
> instead of bothering to read the value.  After all it won't write the
> area if the value at the offset is not 0.

Really almost any atomic that has no side effect will do.
atomic or with 0
atomic and with ffffffff

It's just that cmpxchg already happens to have a portable
wrapper.

> The only downside is that
> this is a locked operation so we will take a pretty serious
> performance penalty when this is active.  As such my preference would
> be to hide the code behind some static key that we could then switch
> on in the event of a VM being migrated.



> > - dma_alloc_coherent memory (e.g. device rings)
> >   must be migrated after device stopped modifying it.
> >   Just stopping the VCPU is not enough:
> >   you must make sure device is not changing it.
> >
> >   Or maybe the device has some kind of ring flush operation,
> >   if there was a reasonably portable way to do this
> >   (e.g. a flush capability could maybe be added to SRIOV)
> >   then hypervisor could do this.
> 
> This is where things start to get messy. I was suggesting the
> suspend/resume to resolve this bit, but it might be possible to also
> deal with this via something like this via clearing the bus master
> enable bit for the VF.  If I am not mistaken that should disable MSI-X
> interrupts and halt any DMA.  That should work as long as you have
> some mechanism that is tracking the pages in use for DMA.

A bigger issue is recovering afterwards.

> >   With existing devices,
> >   either do it after device reset, or disable
> >   memory access in the IOMMU. Maybe both.
> 
> The problem is that disabling the device at the IOMMU will start to
> trigger master abort errors when it tries to access regions it no
> longer has access to.
> 
> >   In case you need to resume on source, you
> >   really need to follow the same path
> >   as on destination, preferably detecting
> >   device reset and restoring the device
> >   state.
> 
> The problem with detecting the reset is that you would likely have to
> be polling to do something like that.

We could some event to guest to notify it about this event
through a new or existing channel.

Or we could make it possible for userspace to trigger this,
then notify guest through the guest agent.

>  I believe the fm10k driver
> already has code like that in place where it will detect a reset as a
> part of its watchdog, however the response time is something like 2
> seconds for that.  That was one of the reasons I preferred something
> like hot-plug as that should be functioning as soon as the guest is up
> and it is a mechanism that operates outside of the VF drivers.

That's pretty minor.
A bigger issue is making sure guest does not crash
when device is suddenly reset under it's legs.

-- 
MST

WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>,
	"Tantilov, Emil S" <emil.s.tantilov@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
	"Rustad, Mark D" <mark.d.rustad@intel.com>,
	"Wyborny, Carolyn" <carolyn.wyborny@intel.com>,
	Eric Auger <eric.auger@linaro.org>,
	"Skidmore, Donald C" <donald.c.skidmore@intel.com>,
	"zajec5@gmail.com" <zajec5@gmail.com>,
	Alexander Graf <agraf@suse.de>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	"Williams, Mitch A" <mitch.a.williams@intel.com>,
	"Jani, Nrupal" <nrupal.jani@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"a.motakis@virtualopensystems.com"
	<a.motakis@virtualopensystems.com>,
	"Lan, Tianyu" <tianyu.lan@intel.com>,
	"b.reynal@virtualopensystems.com"
	<b.reynal@virtualopensystems.com>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	"Nelson, Shannon" <shannon.nelson@intel.com>,
	"Dong, Eddie" <eddie.dong@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Ronciak, John" <john.ronciak@intel.com>,
	Netdev <netdev@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC
Date: Tue, 1 Dec 2015 19:37:51 +0200	[thread overview]
Message-ID: <20151201193026-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <CAKgT0UfLEJpV-KdqRGfzBeas8bdqfHCmT5Xc8iVVP03g_pQO8A@mail.gmail.com>

On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:
> On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Dec 01, 2015 at 11:04:31PM +0800, Lan, Tianyu wrote:
> >>
> >>
> >> On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> >> >They can only be corrected if the underlying assumptions are correct
> >> >and they aren't.  Your solution would have never worked correctly.
> >> >The problem is you assume you can keep the device running when you are
> >> >migrating and you simply cannot.  At some point you will always have
> >> >to stop the device in order to complete the migration, and you cannot
> >> >stop it before you have stopped your page tracking mechanism.  So
> >> >unless the platform has an IOMMU that is somehow taking part in the
> >> >dirty page tracking you will not be able to stop the guest and then
> >> >the device, it will have to be the device and then the guest.
> >> >
> >> >>>Doing suspend and resume() may help to do migration easily but some
> >> >>>devices requires low service down time. Especially network and I got
> >> >>>that some cloud company promised less than 500ms network service downtime.
> >> >Honestly focusing on the downtime is getting the cart ahead of the
> >> >horse.  First you need to be able to do this without corrupting system
> >> >memory and regardless of the state of the device.  You haven't even
> >> >gotten to that state yet.  Last I knew the device had to be up in
> >> >order for your migration to even work.
> >>
> >> I think the issue is that the content of rx package delivered to stack maybe
> >> changed during migration because the piece of memory won't be migrated to
> >> new machine. This may confuse applications or stack. Current dummy write
> >> solution can ensure the content of package won't change after doing dummy
> >> write while the content maybe not received data if migration happens before
> >> that point. We can recheck the content via checksum or crc in the protocol
> >> after dummy write to ensure the content is what VF received. I think stack
> >> has already done such checks and the package will be abandoned if failed to
> >> pass through the check.
> >
> >
> > Most people nowdays rely on hardware checksums so I don't think this can
> > fly.
> 
> Correct.  The checksum/crc approach will not work since it is possible
> for a checksum to even be mangled in the case of some features such as
> LRO or GRO.
> 
> >> Another way is to tell all memory driver are using to Qemu and let Qemu to
> >> migrate these memory after stopping VCPU and the device. This seems safe but
> >> implementation maybe complex.
> >
> > Not really 100% safe.  See below.
> >
> > I think hiding these details behind dma_* API does have
> > some appeal. In any case, it gives us a good
> > terminology as it covers what most drivers do.
> 
> That was kind of my thought.  If we were to build our own
> dma_mark_clean() type function that will mark the DMA region dirty on
> sync or unmap then that is half the battle right there as we would be
> able to at least keep the regions consistent after they have left the
> driver.
> 
> > There are several components to this:
> > - dma_map_* needs to prevent page from
> >   being migrated while device is running.
> >   For example, expose some kind of bitmap from guest
> >   to host, set bit there while page is mapped.
> >   What happens if we stop the guest and some
> >   bits are still set? See dma_alloc_coherent below
> >   for some ideas.
> 
> Yeah, I could see something like this working.  Maybe we could do
> something like what was done for the NX bit and make use of the upper
> order bits beyond the limits of the memory range to mark pages as
> non-migratable?
> 
> I'm curious.  What we have with a DMA mapped region is essentially
> shared memory between the guest and the device.  How would we resolve
> something like this with IVSHMEM, or are we blocked there as well in
> terms of migration?

I have some ideas. Will post later.

> > - dma_unmap_* needs to mark page as dirty
> >   This can be done by writing into a page.
> >
> > - dma_sync_* needs to mark page as dirty
> >   This is trickier as we can not change the data.
> >   One solution is using atomics.
> >   For example:
> >         int x = ACCESS_ONCE(*p);
> >         cmpxchg(p, x, x);
> >   Seems to do a write without changing page
> >   contents.
> 
> Like I said we can probably kill 2 birds with one stone by just
> implementing our own dma_mark_clean() for x86 virtualized
> environments.
> 
> I'd say we could take your solution one step further and just use 0
> instead of bothering to read the value.  After all it won't write the
> area if the value at the offset is not 0.

Really almost any atomic that has no side effect will do.
atomic or with 0
atomic and with ffffffff

It's just that cmpxchg already happens to have a portable
wrapper.

> The only downside is that
> this is a locked operation so we will take a pretty serious
> performance penalty when this is active.  As such my preference would
> be to hide the code behind some static key that we could then switch
> on in the event of a VM being migrated.



> > - dma_alloc_coherent memory (e.g. device rings)
> >   must be migrated after device stopped modifying it.
> >   Just stopping the VCPU is not enough:
> >   you must make sure device is not changing it.
> >
> >   Or maybe the device has some kind of ring flush operation,
> >   if there was a reasonably portable way to do this
> >   (e.g. a flush capability could maybe be added to SRIOV)
> >   then hypervisor could do this.
> 
> This is where things start to get messy. I was suggesting the
> suspend/resume to resolve this bit, but it might be possible to also
> deal with this via something like this via clearing the bus master
> enable bit for the VF.  If I am not mistaken that should disable MSI-X
> interrupts and halt any DMA.  That should work as long as you have
> some mechanism that is tracking the pages in use for DMA.

A bigger issue is recovering afterwards.

> >   With existing devices,
> >   either do it after device reset, or disable
> >   memory access in the IOMMU. Maybe both.
> 
> The problem is that disabling the device at the IOMMU will start to
> trigger master abort errors when it tries to access regions it no
> longer has access to.
> 
> >   In case you need to resume on source, you
> >   really need to follow the same path
> >   as on destination, preferably detecting
> >   device reset and restoring the device
> >   state.
> 
> The problem with detecting the reset is that you would likely have to
> be polling to do something like that.

We could some event to guest to notify it about this event
through a new or existing channel.

Or we could make it possible for userspace to trigger this,
then notify guest through the guest agent.

>  I believe the fm10k driver
> already has code like that in place where it will detect a reset as a
> part of its watchdog, however the response time is something like 2
> seconds for that.  That was one of the reasons I preferred something
> like hot-plug as that should be functioning as soon as the guest is up
> and it is a mechanism that operates outside of the VF drivers.

That's pretty minor.
A bigger issue is making sure guest does not crash
when device is suddenly reset under it's legs.

-- 
MST

WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Alexander Duyck
	<alexander.duyck-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: "Lan,
	Tianyu" <tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Dong,
	Eddie" <eddie.dong-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org"
	<a.motakis-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>,
	Alex Williamson
	<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"b.reynal-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org"
	<b.reynal-lrHrjnjw1UfHK3s98zE1ajGjJy/sRE9J@public.gmane.org>,
	Bjorn Helgaas <bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	"Wyborny,
	Carolyn"
	<carolyn.wyborny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Skidmore,
	Donald C"
	<donald.c.skidmore-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Jani,
	Nrupal" <nrupal.jani-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>,
	"kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Paolo Bonzini <pbonzini-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org"
	<qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org>,
	"Tantilov,
	Emil S" <emil.s.tantilov-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Or Gerlitz <gerlitz.or-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"Rustad,
	Mark D" <mark.d.rustad-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	intel-wired-lan
	<intel-wired-lan-qjLDD68F18P21nG7glBr7A@public.gmane.org>,
	"Kirsher,
	Jeffrey T"
	<jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Brandeb
Subject: Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC
Date: Tue, 1 Dec 2015 19:37:51 +0200	[thread overview]
Message-ID: <20151201193026-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <CAKgT0UfLEJpV-KdqRGfzBeas8bdqfHCmT5Xc8iVVP03g_pQO8A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:
> On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Tue, Dec 01, 2015 at 11:04:31PM +0800, Lan, Tianyu wrote:
> >>
> >>
> >> On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> >> >They can only be corrected if the underlying assumptions are correct
> >> >and they aren't.  Your solution would have never worked correctly.
> >> >The problem is you assume you can keep the device running when you are
> >> >migrating and you simply cannot.  At some point you will always have
> >> >to stop the device in order to complete the migration, and you cannot
> >> >stop it before you have stopped your page tracking mechanism.  So
> >> >unless the platform has an IOMMU that is somehow taking part in the
> >> >dirty page tracking you will not be able to stop the guest and then
> >> >the device, it will have to be the device and then the guest.
> >> >
> >> >>>Doing suspend and resume() may help to do migration easily but some
> >> >>>devices requires low service down time. Especially network and I got
> >> >>>that some cloud company promised less than 500ms network service downtime.
> >> >Honestly focusing on the downtime is getting the cart ahead of the
> >> >horse.  First you need to be able to do this without corrupting system
> >> >memory and regardless of the state of the device.  You haven't even
> >> >gotten to that state yet.  Last I knew the device had to be up in
> >> >order for your migration to even work.
> >>
> >> I think the issue is that the content of rx package delivered to stack maybe
> >> changed during migration because the piece of memory won't be migrated to
> >> new machine. This may confuse applications or stack. Current dummy write
> >> solution can ensure the content of package won't change after doing dummy
> >> write while the content maybe not received data if migration happens before
> >> that point. We can recheck the content via checksum or crc in the protocol
> >> after dummy write to ensure the content is what VF received. I think stack
> >> has already done such checks and the package will be abandoned if failed to
> >> pass through the check.
> >
> >
> > Most people nowdays rely on hardware checksums so I don't think this can
> > fly.
> 
> Correct.  The checksum/crc approach will not work since it is possible
> for a checksum to even be mangled in the case of some features such as
> LRO or GRO.
> 
> >> Another way is to tell all memory driver are using to Qemu and let Qemu to
> >> migrate these memory after stopping VCPU and the device. This seems safe but
> >> implementation maybe complex.
> >
> > Not really 100% safe.  See below.
> >
> > I think hiding these details behind dma_* API does have
> > some appeal. In any case, it gives us a good
> > terminology as it covers what most drivers do.
> 
> That was kind of my thought.  If we were to build our own
> dma_mark_clean() type function that will mark the DMA region dirty on
> sync or unmap then that is half the battle right there as we would be
> able to at least keep the regions consistent after they have left the
> driver.
> 
> > There are several components to this:
> > - dma_map_* needs to prevent page from
> >   being migrated while device is running.
> >   For example, expose some kind of bitmap from guest
> >   to host, set bit there while page is mapped.
> >   What happens if we stop the guest and some
> >   bits are still set? See dma_alloc_coherent below
> >   for some ideas.
> 
> Yeah, I could see something like this working.  Maybe we could do
> something like what was done for the NX bit and make use of the upper
> order bits beyond the limits of the memory range to mark pages as
> non-migratable?
> 
> I'm curious.  What we have with a DMA mapped region is essentially
> shared memory between the guest and the device.  How would we resolve
> something like this with IVSHMEM, or are we blocked there as well in
> terms of migration?

I have some ideas. Will post later.

> > - dma_unmap_* needs to mark page as dirty
> >   This can be done by writing into a page.
> >
> > - dma_sync_* needs to mark page as dirty
> >   This is trickier as we can not change the data.
> >   One solution is using atomics.
> >   For example:
> >         int x = ACCESS_ONCE(*p);
> >         cmpxchg(p, x, x);
> >   Seems to do a write without changing page
> >   contents.
> 
> Like I said we can probably kill 2 birds with one stone by just
> implementing our own dma_mark_clean() for x86 virtualized
> environments.
> 
> I'd say we could take your solution one step further and just use 0
> instead of bothering to read the value.  After all it won't write the
> area if the value at the offset is not 0.

Really almost any atomic that has no side effect will do.
atomic or with 0
atomic and with ffffffff

It's just that cmpxchg already happens to have a portable
wrapper.

> The only downside is that
> this is a locked operation so we will take a pretty serious
> performance penalty when this is active.  As such my preference would
> be to hide the code behind some static key that we could then switch
> on in the event of a VM being migrated.



> > - dma_alloc_coherent memory (e.g. device rings)
> >   must be migrated after device stopped modifying it.
> >   Just stopping the VCPU is not enough:
> >   you must make sure device is not changing it.
> >
> >   Or maybe the device has some kind of ring flush operation,
> >   if there was a reasonably portable way to do this
> >   (e.g. a flush capability could maybe be added to SRIOV)
> >   then hypervisor could do this.
> 
> This is where things start to get messy. I was suggesting the
> suspend/resume to resolve this bit, but it might be possible to also
> deal with this via something like this via clearing the bus master
> enable bit for the VF.  If I am not mistaken that should disable MSI-X
> interrupts and halt any DMA.  That should work as long as you have
> some mechanism that is tracking the pages in use for DMA.

A bigger issue is recovering afterwards.

> >   With existing devices,
> >   either do it after device reset, or disable
> >   memory access in the IOMMU. Maybe both.
> 
> The problem is that disabling the device at the IOMMU will start to
> trigger master abort errors when it tries to access regions it no
> longer has access to.
> 
> >   In case you need to resume on source, you
> >   really need to follow the same path
> >   as on destination, preferably detecting
> >   device reset and restoring the device
> >   state.
> 
> The problem with detecting the reset is that you would likely have to
> be polling to do something like that.

We could some event to guest to notify it about this event
through a new or existing channel.

Or we could make it possible for userspace to trigger this,
then notify guest through the guest agent.

>  I believe the fm10k driver
> already has code like that in place where it will detect a reset as a
> part of its watchdog, however the response time is something like 2
> seconds for that.  That was one of the reasons I preferred something
> like hot-plug as that should be functioning as soon as the guest is up
> and it is a mechanism that operates outside of the VF drivers.

That's pretty minor.
A bigger issue is making sure guest does not crash
when device is suddenly reset under it's legs.

-- 
MST

WARNING: multiple messages have this Message-ID (diff)
From: Michael S. Tsirkin <mst@redhat.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC
Date: Tue, 1 Dec 2015 19:37:51 +0200	[thread overview]
Message-ID: <20151201193026-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <CAKgT0UfLEJpV-KdqRGfzBeas8bdqfHCmT5Xc8iVVP03g_pQO8A@mail.gmail.com>

On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:
> On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Dec 01, 2015 at 11:04:31PM +0800, Lan, Tianyu wrote:
> >>
> >>
> >> On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> >> >They can only be corrected if the underlying assumptions are correct
> >> >and they aren't.  Your solution would have never worked correctly.
> >> >The problem is you assume you can keep the device running when you are
> >> >migrating and you simply cannot.  At some point you will always have
> >> >to stop the device in order to complete the migration, and you cannot
> >> >stop it before you have stopped your page tracking mechanism.  So
> >> >unless the platform has an IOMMU that is somehow taking part in the
> >> >dirty page tracking you will not be able to stop the guest and then
> >> >the device, it will have to be the device and then the guest.
> >> >
> >> >>>Doing suspend and resume() may help to do migration easily but some
> >> >>>devices requires low service down time. Especially network and I got
> >> >>>that some cloud company promised less than 500ms network service downtime.
> >> >Honestly focusing on the downtime is getting the cart ahead of the
> >> >horse.  First you need to be able to do this without corrupting system
> >> >memory and regardless of the state of the device.  You haven't even
> >> >gotten to that state yet.  Last I knew the device had to be up in
> >> >order for your migration to even work.
> >>
> >> I think the issue is that the content of rx package delivered to stack maybe
> >> changed during migration because the piece of memory won't be migrated to
> >> new machine. This may confuse applications or stack. Current dummy write
> >> solution can ensure the content of package won't change after doing dummy
> >> write while the content maybe not received data if migration happens before
> >> that point. We can recheck the content via checksum or crc in the protocol
> >> after dummy write to ensure the content is what VF received. I think stack
> >> has already done such checks and the package will be abandoned if failed to
> >> pass through the check.
> >
> >
> > Most people nowdays rely on hardware checksums so I don't think this can
> > fly.
> 
> Correct.  The checksum/crc approach will not work since it is possible
> for a checksum to even be mangled in the case of some features such as
> LRO or GRO.
> 
> >> Another way is to tell all memory driver are using to Qemu and let Qemu to
> >> migrate these memory after stopping VCPU and the device. This seems safe but
> >> implementation maybe complex.
> >
> > Not really 100% safe.  See below.
> >
> > I think hiding these details behind dma_* API does have
> > some appeal. In any case, it gives us a good
> > terminology as it covers what most drivers do.
> 
> That was kind of my thought.  If we were to build our own
> dma_mark_clean() type function that will mark the DMA region dirty on
> sync or unmap then that is half the battle right there as we would be
> able to at least keep the regions consistent after they have left the
> driver.
> 
> > There are several components to this:
> > - dma_map_* needs to prevent page from
> >   being migrated while device is running.
> >   For example, expose some kind of bitmap from guest
> >   to host, set bit there while page is mapped.
> >   What happens if we stop the guest and some
> >   bits are still set? See dma_alloc_coherent below
> >   for some ideas.
> 
> Yeah, I could see something like this working.  Maybe we could do
> something like what was done for the NX bit and make use of the upper
> order bits beyond the limits of the memory range to mark pages as
> non-migratable?
> 
> I'm curious.  What we have with a DMA mapped region is essentially
> shared memory between the guest and the device.  How would we resolve
> something like this with IVSHMEM, or are we blocked there as well in
> terms of migration?

I have some ideas. Will post later.

> > - dma_unmap_* needs to mark page as dirty
> >   This can be done by writing into a page.
> >
> > - dma_sync_* needs to mark page as dirty
> >   This is trickier as we can not change the data.
> >   One solution is using atomics.
> >   For example:
> >         int x = ACCESS_ONCE(*p);
> >         cmpxchg(p, x, x);
> >   Seems to do a write without changing page
> >   contents.
> 
> Like I said we can probably kill 2 birds with one stone by just
> implementing our own dma_mark_clean() for x86 virtualized
> environments.
> 
> I'd say we could take your solution one step further and just use 0
> instead of bothering to read the value.  After all it won't write the
> area if the value at the offset is not 0.

Really almost any atomic that has no side effect will do.
atomic or with 0
atomic and with ffffffff

It's just that cmpxchg already happens to have a portable
wrapper.

> The only downside is that
> this is a locked operation so we will take a pretty serious
> performance penalty when this is active.  As such my preference would
> be to hide the code behind some static key that we could then switch
> on in the event of a VM being migrated.



> > - dma_alloc_coherent memory (e.g. device rings)
> >   must be migrated after device stopped modifying it.
> >   Just stopping the VCPU is not enough:
> >   you must make sure device is not changing it.
> >
> >   Or maybe the device has some kind of ring flush operation,
> >   if there was a reasonably portable way to do this
> >   (e.g. a flush capability could maybe be added to SRIOV)
> >   then hypervisor could do this.
> 
> This is where things start to get messy. I was suggesting the
> suspend/resume to resolve this bit, but it might be possible to also
> deal with this via something like this via clearing the bus master
> enable bit for the VF.  If I am not mistaken that should disable MSI-X
> interrupts and halt any DMA.  That should work as long as you have
> some mechanism that is tracking the pages in use for DMA.

A bigger issue is recovering afterwards.

> >   With existing devices,
> >   either do it after device reset, or disable
> >   memory access in the IOMMU. Maybe both.
> 
> The problem is that disabling the device at the IOMMU will start to
> trigger master abort errors when it tries to access regions it no
> longer has access to.
> 
> >   In case you need to resume on source, you
> >   really need to follow the same path
> >   as on destination, preferably detecting
> >   device reset and restoring the device
> >   state.
> 
> The problem with detecting the reset is that you would likely have to
> be polling to do something like that.

We could some event to guest to notify it about this event
through a new or existing channel.

Or we could make it possible for userspace to trigger this,
then notify guest through the guest agent.

>  I believe the fm10k driver
> already has code like that in place where it will detect a reset as a
> part of its watchdog, however the response time is something like 2
> seconds for that.  That was one of the reasons I preferred something
> like hot-plug as that should be functioning as soon as the guest is up
> and it is a mechanism that operates outside of the VF drivers.

That's pretty minor.
A bigger issue is making sure guest does not crash
when device is suddenly reset under it's legs.

-- 
MST

  reply	other threads:[~2015-12-01 17:38 UTC|newest]

Thread overview: 173+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-24 13:38 [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC Lan Tianyu
2015-11-24 13:38 ` [Intel-wired-lan] " Lan Tianyu
2015-11-24 13:38 ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:38 ` [RFC PATCH V2 1/3] VFIO: Add new ioctl cmd VFIO_GET_PCI_CAP_INFO Lan Tianyu
2015-11-24 13:38   ` [Intel-wired-lan] " Lan Tianyu
2015-11-24 13:38   ` Lan Tianyu
2015-11-24 13:38   ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:38 ` [RFC PATCH V2 2/3] PCI: Add macros for faked PCI migration capability Lan Tianyu
2015-11-24 13:38   ` [Intel-wired-lan] " Lan Tianyu
2015-11-24 13:38   ` Lan Tianyu
2015-11-24 13:38   ` [Qemu-devel] " Lan Tianyu
2015-11-24 13:38 ` [RFC PATCH V2 3/3] Ixgbevf: Add migration support for ixgbevf driver Lan Tianyu
2015-11-24 13:38   ` [Intel-wired-lan] " Lan Tianyu
2015-11-24 13:38   ` [Qemu-devel] " Lan Tianyu
2015-11-24 21:20   ` Michael S. Tsirkin
2015-11-24 21:20     ` [Intel-wired-lan] " Michael S. Tsirkin
2015-11-24 21:20     ` [Qemu-devel] " Michael S. Tsirkin
2015-11-25  5:39     ` Alexander Duyck
2015-11-25  5:39       ` [Intel-wired-lan] " Alexander Duyck
2015-11-25  5:39       ` Alexander Duyck
2015-11-25  5:39       ` Alexander Duyck
2015-11-25  5:39       ` [Qemu-devel] " Alexander Duyck
2015-11-25  5:39     ` Lan Tianyu
2015-11-25  5:39       ` [Intel-wired-lan] " Lan Tianyu
2015-11-25  5:39       ` Lan Tianyu
2015-11-25  5:39       ` [Qemu-devel] " Lan Tianyu
2015-11-25 12:28       ` Michael S. Tsirkin
2015-11-25 12:28         ` [Intel-wired-lan] " Michael S. Tsirkin
2015-11-25 12:28         ` Michael S. Tsirkin
2015-11-25 12:28         ` [Qemu-devel] " Michael S. Tsirkin
2015-11-25 16:02         ` Lan, Tianyu
2015-11-25 16:02           ` [Intel-wired-lan] " Lan, Tianyu
2015-11-25 16:02           ` Lan, Tianyu
2015-11-25 16:02           ` [Qemu-devel] " Lan, Tianyu
2015-11-25 16:22           ` Michael S. Tsirkin
2015-11-25 16:22             ` [Intel-wired-lan] " Michael S. Tsirkin
2015-11-25 16:22             ` [Qemu-devel] " Michael S. Tsirkin
2015-11-25 16:24           ` Alexander Duyck
2015-11-25 16:24             ` [Intel-wired-lan] " Alexander Duyck
2015-11-25 16:24             ` Alexander Duyck
2015-11-25 16:24             ` Alexander Duyck
2015-11-25 16:24             ` [Qemu-devel] " Alexander Duyck
2015-11-25 16:39             ` Michael S. Tsirkin
2015-11-25 16:39               ` [Intel-wired-lan] " Michael S. Tsirkin
2015-11-25 16:39               ` Michael S. Tsirkin
2015-11-25 16:39               ` Michael S. Tsirkin
2015-11-25 16:39               ` [Qemu-devel] " Michael S. Tsirkin
2015-11-25 17:24               ` Alexander Duyck
2015-11-25 17:24                 ` [Intel-wired-lan] " Alexander Duyck
2015-11-25 17:24                 ` Alexander Duyck
2015-11-25 17:24                 ` Alexander Duyck
2015-11-25 17:24                 ` [Qemu-devel] " Alexander Duyck
2015-11-24 14:20 ` [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC Alexander Duyck
2015-11-24 14:20   ` [Intel-wired-lan] " Alexander Duyck
2015-11-24 14:20   ` [Qemu-devel] " Alexander Duyck
2015-11-25  3:18   ` Lan Tianyu
2015-11-25  3:18     ` [Intel-wired-lan] " Lan Tianyu
2015-11-25  3:18     ` Lan Tianyu
2015-11-25  3:18     ` [Qemu-devel] " Lan Tianyu
2015-11-25  5:30     ` Alexander Duyck
2015-11-25  5:30       ` [Intel-wired-lan] " Alexander Duyck
2015-11-25  5:30       ` Alexander Duyck
2015-11-25  5:30       ` Alexander Duyck
2015-11-25  5:30       ` [Qemu-devel] " Alexander Duyck
2015-11-25  8:21       ` Lan Tianyu
2015-11-25  8:21         ` [Intel-wired-lan] " Lan Tianyu
2015-11-25  8:21         ` Lan Tianyu
2015-11-25  8:21         ` Lan Tianyu
2015-11-25  8:21         ` [Qemu-devel] " Lan Tianyu
2015-11-25 15:32         ` Alexander Duyck
2015-11-25 15:32           ` [Intel-wired-lan] " Alexander Duyck
2015-11-25 15:32           ` Alexander Duyck
2015-11-25 15:32           ` Alexander Duyck
2015-11-25 15:32           ` [Qemu-devel] " Alexander Duyck
2015-11-26  3:15           ` Dong, Eddie
2015-11-26  3:15             ` [Intel-wired-lan] " Dong, Eddie
2015-11-26  3:15             ` Dong, Eddie
2015-11-26  3:15             ` Dong, Eddie
2015-11-26  3:15             ` [Qemu-devel] " Dong, Eddie
2015-11-26  3:56             ` Alexander Duyck
2015-11-26  3:56               ` [Intel-wired-lan] " Alexander Duyck
2015-11-26  3:56               ` Alexander Duyck
2015-11-26  3:56               ` Alexander Duyck
2015-11-26  3:56               ` [Qemu-devel] " Alexander Duyck
2015-11-30  6:53               ` Lan, Tianyu
2015-11-30  6:53                 ` [Intel-wired-lan] " Lan, Tianyu
2015-11-30  6:53                 ` Lan, Tianyu
2015-11-30  6:53                 ` Lan, Tianyu
2015-11-30  6:53                 ` [Qemu-devel] " Lan, Tianyu
2015-11-30 16:07                 ` Alexander Duyck
2015-11-30 16:07                   ` [Intel-wired-lan] " Alexander Duyck
2015-11-30 16:07                   ` Alexander Duyck
2015-11-30 16:07                   ` [Qemu-devel] " Alexander Duyck
2015-12-01 15:04                   ` Lan, Tianyu
2015-12-01 15:04                     ` [Intel-wired-lan] " Lan, Tianyu
2015-12-01 15:04                     ` Lan, Tianyu
2015-12-01 15:04                     ` [Qemu-devel] " Lan, Tianyu
2015-12-01 15:28                     ` Michael S. Tsirkin
2015-12-01 15:28                       ` [Intel-wired-lan] " Michael S. Tsirkin
2015-12-01 15:28                       ` Michael S. Tsirkin
2015-12-01 15:28                       ` Michael S. Tsirkin
2015-12-01 15:28                       ` [Qemu-devel] " Michael S. Tsirkin
2015-12-01 17:04                       ` Alexander Duyck
2015-12-01 17:04                         ` [Intel-wired-lan] " Alexander Duyck
2015-12-01 17:04                         ` Alexander Duyck
2015-12-01 17:04                         ` [Qemu-devel] " Alexander Duyck
2015-12-01 17:37                         ` Michael S. Tsirkin [this message]
2015-12-01 17:37                           ` [Intel-wired-lan] " Michael S. Tsirkin
2015-12-01 17:37                           ` Michael S. Tsirkin
2015-12-01 17:37                           ` [Qemu-devel] " Michael S. Tsirkin
2015-12-01 18:36                           ` Alexander Duyck
2015-12-01 18:36                             ` [Intel-wired-lan] " Alexander Duyck
2015-12-01 18:36                             ` Alexander Duyck
2015-12-01 18:36                             ` [Qemu-devel] " Alexander Duyck
2015-12-02 11:44                             ` Michael S. Tsirkin
2015-12-02 11:44                               ` [Intel-wired-lan] " Michael S. Tsirkin
2015-12-02 11:44                               ` Michael S. Tsirkin
2015-12-02 11:44                               ` [Qemu-devel] " Michael S. Tsirkin
2015-12-04 16:32                               ` Lan, Tianyu
2015-12-04 16:32                                 ` [Intel-wired-lan] " Lan, Tianyu
2015-12-04 16:32                                 ` Lan, Tianyu
2015-12-04 16:32                                 ` Lan, Tianyu
2015-12-04 16:32                                 ` [Qemu-devel] " Lan, Tianyu
2015-12-04 17:07                                 ` Alexander Duyck
2015-12-04 17:07                                   ` [Intel-wired-lan] " Alexander Duyck
2015-12-04 17:07                                   ` Alexander Duyck
2015-12-04 17:07                                   ` Alexander Duyck
2015-12-04 17:07                                   ` [Qemu-devel] " Alexander Duyck
2015-12-07 15:40                                   ` Lan, Tianyu
2015-12-07 15:40                                     ` [Intel-wired-lan] " Lan, Tianyu
2015-12-07 15:40                                     ` Lan, Tianyu
2015-12-07 15:40                                     ` Lan, Tianyu
2015-12-07 15:40                                     ` [Qemu-devel] " Lan, Tianyu
2015-12-07 17:12                                     ` Alexander Duyck
2015-12-07 17:12                                       ` [Intel-wired-lan] " Alexander Duyck
2015-12-07 17:12                                       ` Alexander Duyck
2015-12-07 17:12                                       ` [Qemu-devel] " Alexander Duyck
2015-12-07 17:39                                       ` Michael S. Tsirkin
2015-12-07 17:39                                         ` [Intel-wired-lan] " Michael S. Tsirkin
2015-12-07 17:39                                         ` Michael S. Tsirkin
2015-12-07 17:39                                         ` [Qemu-devel] " Michael S. Tsirkin
2015-12-07 18:42                                         ` Alexander Duyck
2015-12-07 18:42                                           ` [Intel-wired-lan] " Alexander Duyck
2015-12-07 18:42                                           ` Alexander Duyck
2015-12-07 18:42                                           ` [Qemu-devel] " Alexander Duyck
2015-12-09  9:28                                       ` Lan, Tianyu
2015-12-09  9:28                                         ` [Intel-wired-lan] " Lan, Tianyu
2015-12-09  9:28                                         ` Lan, Tianyu
2015-12-09  9:28                                         ` [Qemu-devel] " Lan, Tianyu
2015-12-09 16:36                                         ` Alexander Duyck
2015-12-09 16:36                                           ` [Intel-wired-lan] " Alexander Duyck
2015-12-09 16:36                                           ` Alexander Duyck
2015-12-09 16:36                                           ` [Qemu-devel] " Alexander Duyck
2015-12-09 10:37                                 ` Michael S. Tsirkin
2015-12-09 10:37                                   ` [Intel-wired-lan] " Michael S. Tsirkin
2015-12-09 10:37                                   ` Michael S. Tsirkin
2015-12-09 10:37                                   ` Michael S. Tsirkin
2015-12-09 10:37                                   ` [Qemu-devel] " Michael S. Tsirkin
2015-12-09 11:19                                   ` Lan, Tianyu
2015-12-09 11:19                                     ` [Intel-wired-lan] " Lan, Tianyu
2015-12-09 11:19                                     ` Lan, Tianyu
2015-12-09 11:19                                     ` Lan, Tianyu
2015-12-09 11:19                                     ` [Qemu-devel] " Lan, Tianyu
2015-12-09 11:28                                     ` Michael S. Tsirkin
2015-12-09 11:28                                       ` [Intel-wired-lan] " Michael S. Tsirkin
2015-12-09 11:28                                       ` Michael S. Tsirkin
2015-12-09 11:28                                       ` Michael S. Tsirkin
2015-12-09 11:28                                       ` [Qemu-devel] " Michael S. Tsirkin
2015-12-09 11:41                                       ` Lan, Tianyu
2015-12-09 11:41                                         ` [Intel-wired-lan] " Lan, Tianyu
2015-12-09 11:41                                         ` Lan, Tianyu
2015-12-09 11:41                                         ` Lan, Tianyu
2015-12-09 11:41                                         ` [Qemu-devel] " Lan, Tianyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151201193026-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=a.motakis@virtualopensystems.com \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=b.reynal@virtualopensystems.com \
    --cc=bhelgaas@google.com \
    --cc=carolyn.wyborny@intel.com \
    --cc=donald.c.skidmore@intel.com \
    --cc=eddie.dong@intel.com \
    --cc=emil.s.tantilov@intel.com \
    --cc=eric.auger@linaro.org \
    --cc=gerlitz.or@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.ronciak@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.d.rustad@intel.com \
    --cc=mitch.a.williams@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=nrupal.jani@intel.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shannon.nelson@intel.com \
    --cc=tianyu.lan@intel.com \
    --cc=weiyang@linux.vnet.ibm.com \
    --cc=zajec5@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.