From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52657) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMV3P-0008BJ-VU for qemu-devel@nongnu.org; Tue, 04 Aug 2015 01:46:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZMV3M-0003X2-PQ for qemu-devel@nongnu.org; Tue, 04 Aug 2015 01:46:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46862) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZMV3M-0003Wp-KI for qemu-devel@nongnu.org; Tue, 04 Aug 2015 01:46:20 -0400 Date: Tue, 4 Aug 2015 11:16:15 +0530 From: Amit Shah Message-ID: <20150804054615.GI28564@grmbl.mre> References: <1434450415-11339-1-git-send-email-dgilbert@redhat.com> <1434450415-11339-36-git-send-email-dgilbert@redhat.com> <20150727074318.GF12267@grmbl.mre> <20150731095045.GE2272@work-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150731095045.GE2272@work-vm> Subject: Re: [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, liang.z.li@intel.com, qemu-devel@nongnu.org, luis@cs.umu.se, pbonzini@redhat.com, david@gibson.dropbear.id.au On (Fri) 31 Jul 2015 [10:50:46], Dr. David Alan Gilbert wrote: > * Amit Shah (amit.shah@redhat.com) wrote: > > On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote: > > > From: "Dr. David Alan Gilbert" > > > > > > Once we're in postcopy the source processors are stopped and memory > > > shouldn't change any more, so there's no need to look at the dirty > > > map. > > > > > > There are two notes to this: > > > 1) If we do resync and a page had changed then the page would get > > > sent again, which the destination wouldn't allow (since it might > > > have also modified the page) > > > 2) Before disabling this I'd seen very rare cases where a page had been > > > marked dirtied although the memory contents are apparently identical > > > > I suppose we don't know why. Any way to send a message to the dest > > with this info, so the dest can print out something? That'll help in > > debugging. (I'm suggesting sending a message to the dest, because > > after a migration, we don't ever think of looking at messages on the > > src. And chances are the dest could blow up after a migration is > > successful because of such "corruption".) > > One way perhaps would be to do one more sync at the end, after migration > is apparently finished, but before the socket was closed; that would > detect these changes and you could send a message to the other end. However, > given that (2) I say that where I'd seen it the page contents were > identical, this could be a false alarm, so we'd need to be careful. > It also doesn't help you find out *why* it happens, since tracing > back from a bit in the migration bitmap to the area of memory > and the thing that marked it dirty is very hard. The only way to do > that, is to mark the memory as read-only and then get a backtrace > to find out who tried to change it; but you don't want to do > that on a normal build and cause the source to die. Agreed - but some notification that something might possibly be wrong is better than we not having such a clue, and fervently trying to debug an issue. In fact, a per-VM flag could be better since multiple migrations may mean such notifications could be lost in the logs of a previous host which we don't examine. Amit