From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37148) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z8OT9-00012R-4q for qemu-devel@nongnu.org; Fri, 26 Jun 2015 03:54:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z8OT4-00030V-35 for qemu-devel@nongnu.org; Fri, 26 Jun 2015 03:54:39 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:38436) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z8OT3-0002xR-3E for qemu-devel@nongnu.org; Fri, 26 Jun 2015 03:54:34 -0400 Message-ID: <558D04F0.5050904@huawei.com> Date: Fri, 26 Jun 2015 15:53:20 +0800 From: zhanghailiang MIME-Version: 1.0 References: <1434450415-11339-1-git-send-email-dgilbert@redhat.com> <1434450415-11339-2-git-send-email-dgilbert@redhat.com> <558CF559.9060208@cn.fujitsu.com> In-Reply-To: <558CF559.9060208@cn.fujitsu.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yang Hongyang , "Dr. David Alan Gilbert (git)" , qemu-devel@nongnu.org Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, quintela@redhat.com, liang.z.li@intel.com, peter.huangpeng@huawei.com, luis@cs.umu.se, amit.shah@redhat.com, pbonzini@redhat.com, david@gibson.dropbear.id.au On 2015/6/26 14:46, Yang Hongyang wrote: > Hi Dave, > > On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote: >> From: "Dr. David Alan Gilbert" >> > [...] >> += Postcopy = >> +'Postcopy' migration is a way to deal with migrations that refuse to converge; >> +its plus side is that there is an upper bound on the amount of migration traffic >> +and time it takes, the down side is that during the postcopy phase, a failure of >> +*either* side or the network connection causes the guest to be lost. >> + >> +In postcopy the destination CPUs are started before all the memory has been >> +transferred, and accesses to pages that are yet to be transferred cause >> +a fault that's translated by QEMU into a request to the source QEMU. > > I have a immature idea, > Can we keep a source RAM cache on destination QEMU, instead of request to the > source QEMU, that is: > - When start_postcopy issued, source will paused, and __open another socket > (maybe another migration thread)__ to send the remaining dirty pages to > destination, at the same time, destination will start, and cache the > remaining pages. Er, it seems that current implementation is just like what you described except the ram cache: After switch to post-copy mode, the source side will send the remaining dirty pages as pre-copy. Here it does not need any cache at all, it just places the dirty pages where it will be accessed. > - When the page fault occured, first lookup the page in the CACHE, if it is not > yet received, request to the source QEMU. > - Once the remaining dirty pages are transfered, the source QEMU can go now. > > The existing postcopy mechanism does not need to be changed, just add the > remaining page transfer mechanism, and the RAM cache. > > I don't know if it is feasible and whether it will bring improvement to the > postcopy, what do you think? > >> + >> +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy >> +doesn't finish in a given time the switch is made to postcopy. >> + >> +=== Enabling postcopy === >> + >> +To enable postcopy (prior to the start of migration): >> + >> +migrate_set_capability x-postcopy-ram on >> + >> +The migration will still start in precopy mode, however issuing: >> + >> +migrate_start_postcopy >> + >> +will now cause the transition from precopy to postcopy. >> +It can be issued immediately after migration is started or any >> +time later on. Issuing it after the end of a migration is harmless. >> + >> +=== Postcopy device transfer === >> + >> +Loading of device data may cause the device emulation to access guest RAM >> +that may trigger faults that have to be resolved by the source, as such >> +the migration stream has to be able to respond with page data *during* the >> +device load, and hence the device data has to be read from the stream completely >> +before the device load begins to free the stream up. This is achieved by >> +'packaging' the device data into a blob that's read in one go. >> + >> +Source behaviour >> + >> +Until postcopy is entered the migration stream is identical to normal >> +precopy, except for the addition of a 'postcopy advise' command at >> +the beginning, to tell the destination that postcopy might happen. >> +When postcopy starts the source sends the page discard data and then >> +forms the 'package' containing: >> + >> + Command: 'postcopy listen' >> + The device state >> + A series of sections, identical to the precopy streams device state stream >> + containing everything except postcopiable devices (i.e. RAM) >> + Command: 'postcopy run' >> + >> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the >> +contents are formatted in the same way as the main migration stream. >> + >> +Destination behaviour >> + >> +Initially the destination looks the same as precopy, with a single thread >> +reading the migration stream; the 'postcopy advise' and 'discard' commands >> +are processed to change the way RAM is managed, but don't affect the stream >> +processing. >> + >> +------------------------------------------------------------------------------ >> + 1 2 3 4 5 6 7 >> +main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN ) >> +thread | | >> + | (page request) >> + | \___ >> + v \ >> +listen thread: --- page -- page -- page -- page -- page -- >> + >> + a b c >> +------------------------------------------------------------------------------ >> + >> +On receipt of CMD_PACKAGED (1) >> + All the data associated with the package - the ( ... ) section in the >> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread >> +recurses into qemu_loadvm_state_main to process the contents of the package (2) >> +which contains commands (3,6) and devices (4...) >> + >> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package) >> +a new thread (a) is started that takes over servicing the migration stream, >> +while the main thread carries on loading the package. It loads normal >> +background page data (b) but if during a device load a fault happens (5) the >> +returned page (c) is loaded by the listen thread allowing the main threads >> +device load to carry on. >> + >> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination >> +CPUs start running. >> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour >> +and is no longer used by migration, while the listen thread carries >> +on servicing page data until the end of migration. >> + >> +=== Postcopy states === >> + >> +Postcopy moves through a series of states (see postcopy_state) from >> +ADVISE->LISTEN->RUNNING->END >> + >> + Advise: Set at the start of migration if postcopy is enabled, even >> + if it hasn't had the start command; here the destination >> + checks that its OS has the support needed for postcopy, and performs >> + setup to ensure the RAM mappings are suitable for later postcopy. >> + (Triggered by reception of POSTCOPY_ADVISE command) >> + >> + Listen: The first command in the package, POSTCOPY_LISTEN, switches >> + the destination state to Listen, and starts a new thread >> + (the 'listen thread') which takes over the job of receiving >> + pages off the migration stream, while the main thread carries >> + on processing the blob. With this thread able to process page >> + reception, the destination now 'sensitises' the RAM to detect >> + any access to missing pages (on Linux using the 'userfault' >> + system). >> + >> + Running: POSTCOPY_RUN causes the destination to synchronise all >> + state and start the CPUs and IO devices running. The main >> + thread now finishes processing the migration package and >> + now carries on as it would for normal precopy migration >> + (although it can't do the cleanup it would do as it >> + finishes a normal migration). >> + >> + End: The listen thread can now quit, and perform the cleanup of migration >> + state, the migration is now complete. >> + >> +=== Source side page maps === >> + >> +The source side keeps two bitmaps during postcopy; 'the migration bitmap' >> +and 'sent map'. The 'migration bitmap' is basically the same as in >> +the precopy case, and holds a bit to indicate that page is 'dirty' - >> +i.e. needs sending. During the precopy phase this is updated as the CPU >> +dirties pages, however during postcopy the CPUs are stopped and nothing >> +should dirty anything any more. >> + >> +The 'sent map' is used for the transition to postcopy. It is a bitmap that >> +has a bit set whenever a page is sent to the destination, however during >> +the transition to postcopy mode it is masked against the migration bitmap >> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that >> +have been previously been sent but are now dirty again. This masked >> +sentmap is sent to the destination which discards those now dirty pages >> +before starting the CPUs. >> + >> +Note that the contents of the sentmap are sacrificed during the calculation >> +of the discard set and thus aren't valid once in postcopy. The dirtymap >> +is still valid and is used to ensure that no page is sent more than once. Any >> +request for a page that has already been sent is ignored. Duplicate requests >> +such as this can happen as a page is sent at about the same time the >> +destination accesses it. >> >