All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v7 00/42] Postcopy implementation
@ 2015-06-16 10:26 Dr. David Alan Gilbert (git)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works Dr. David Alan Gilbert (git)
                   ` (42 more replies)
  0 siblings, 43 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

  This is the 7th cut of my version of postcopy; it is designed for use with
the Linux kernel additions posted by Andrea Arcangeli here:

git clone --reference linux -b userfault21
git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git

Note this API is slightly different from the last version; but the code is now
in the linux-mm tree, the API is getting stable and the kernel code is just
getting fixes now.

This qemu series can be found at:

https://github.com/orbitfp7/qemu.git
on the wp3-postcopy-v7 tag.

It addresses most of the previous review comments, but there are
still one or two I'm working.

As with v6, the userfaultfd.h isn't included in the tree, so you'll
need a userfaultfd.h header (and syscall define) from Andrea's kernel.

This work has been partially funded by the EU Orbit project:
  see http://www.orbitproject.eu/about/

v7
  updated to Andrea's userfault 21 interface
  Don't restart the source after an error if we entered postcopy (thanks to Li Liang for reporting)
  Killed off dead local_tmp_page (thanks to Christoph Seifert for spotting that)
  Made sure the incoming page mutex is only initialised once
  Moved request alignment from destination to source; destination now just verifies the request is aligned
  Got rid of the 'dup' on the return-fd
     Just use the same fd and make the forward path own the fd
  Reworked the consumption of request pages off the queue to handle hosts with larger
    pages more cleanly
  Disallowed enabling postcopy+compression
     This is probably fixable, but it needs reworking of the decompression threads
     to place the pages atomically
  Disallow postcopy+RDMA
     This is probably trickier to fix; RDMA drops the received data straight into memory using
     DMA; we would need to find a way to stop that. I can see a way to make it so that
     the precopy phase still uses RDMA and then stops using it later, but that would
     make it harder when a real fix came along.
  Minor fixups from review
  6 of the smaller patches from v6 are now already in head

TODO
  Testing on machines with hps!=tps
  Tidy up the hps!=tps code on the receive side

Dr. David Alan Gilbert (42):
  Start documenting how postcopy works.
  Provide runtime Target page information
  Init page sizes in qtest
  qemu_ram_block_from_host
  Add qemu_get_buffer_less_copy to avoid copies some of the time
  Add wrapper for setting blocking status on a QEMUFile
  ram_debug_dump_bitmap: Dump a migration bitmap as text
  migrate_init: Call from savevm
  Rename save_live_complete to save_live_complete_precopy
  Return path: Open a return path on QEMUFile for sockets
  Return path: socket_writev_buffer: Block even on non-blocking fd's
  Migration commands
  Return path: Control commands
  Return path: Send responses from destination to source
  Return path: Source handling of return path
  Rework loadvm path for subloops
  Add migration-capability boolean for postcopy-ram.
  Add wrappers and handlers for sending/receiving the postcopy-ram
    migration messages.
  MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  Modify save_live_pending for postcopy
  postcopy: OS support test
  migrate_start_postcopy: Command to trigger transition to postcopy
  MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  Add qemu_savevm_state_complete_postcopy
  Postcopy: Maintain sentmap and calculate discard
  postcopy: Incoming initialisation
  postcopy: ram_enable_notify to switch on userfault
  Postcopy: Postcopy startup in migration thread
  Postcopy end in migration_thread
  Page request:  Add MIG_RP_MSG_REQ_PAGES reverse command
  Page request: Process incoming page request
  Page request: Consume pages off the post-copy queue
  postcopy_ram.c: place_page and helpers
  Postcopy: Use helpers to map pages during migration
  Don't sync dirty bitmaps in postcopy
  Host page!=target page: Cleanup bitmaps
  Postcopy; Handle userfault requests
  Start up a postcopy/listener thread ready for incoming page data
  postcopy: Wire up loadvm_postcopy_handle_ commands
  End of migration for postcopy
  Disable mlock around incoming postcopy
  Inhibit ballooning during postcopy

 balloon.c                        |  11 +
 docs/migration.txt               | 167 +++++++
 exec.c                           |  64 ++-
 hmp-commands.hx                  |  15 +
 hmp.c                            |   7 +
 hmp.h                            |   1 +
 hw/ppc/spapr.c                   |   2 +-
 hw/virtio/virtio-balloon.c       |   4 +-
 include/exec/cpu-all.h           |   2 -
 include/exec/cpu-common.h        |   3 +
 include/migration/migration.h    | 112 ++++-
 include/migration/postcopy-ram.h |  88 ++++
 include/migration/qemu-file.h    |  10 +
 include/migration/vmstate.h      |   8 +-
 include/qemu/typedefs.h          |   3 +
 include/sysemu/balloon.h         |   2 +
 include/sysemu/sysemu.h          |  44 +-
 migration/Makefile.objs          |   2 +-
 migration/block.c                |   9 +-
 migration/migration.c            | 736 ++++++++++++++++++++++++++++--
 migration/postcopy-ram.c         | 706 +++++++++++++++++++++++++++++
 migration/qemu-file-unix.c       | 110 ++++-
 migration/qemu-file.c            |  74 ++++
 migration/ram.c                  | 933 ++++++++++++++++++++++++++++++++++++---
 migration/savevm.c               | 776 +++++++++++++++++++++++++++++---
 qapi-schema.json                 |  18 +-
 qmp-commands.hx                  |  19 +
 qtest.c                          |   1 +
 trace-events                     |  76 +++-
 29 files changed, 3788 insertions(+), 215 deletions(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 migration/postcopy-ram.c

-- 
2.4.3

^ permalink raw reply	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 11:42   ` Juan Quintela
                     ` (3 more replies)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 02/42] Provide runtime Target page information Dr. David Alan Gilbert (git)
                   ` (41 subsequent siblings)
  42 siblings, 4 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 docs/migration.txt | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)

diff --git a/docs/migration.txt b/docs/migration.txt
index f6df4be..b4b93d1 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -291,3 +291,170 @@ save/send this state when we are in the middle of a pio operation
 (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
 not enabled, the values on that fields are garbage and don't need to
 be sent.
+
+= Return path =
+
+In most migration scenarios there is only a single data path that runs
+from the source VM to the destination, typically along a single fd (although
+possibly with another fd or similar for some fast way of throwing pages across).
+
+However, some uses need two way communication; in particular the Postcopy destination
+needs to be able to request pages on demand from the source.
+
+For these scenarios there is a 'return path' from the destination to the source;
+qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
+path.
+
+  Source side
+     Forward path - written by migration thread
+     Return path  - opened by main thread, read by return-path thread
+
+  Destination side
+     Forward path - read by main thread
+     Return path  - opened by main thread, written by main thread AND postcopy
+                    thread (protected by rp_mutex)
+
+= Postcopy =
+'Postcopy' migration is a way to deal with migrations that refuse to converge;
+its plus side is that there is an upper bound on the amount of migration traffic
+and time it takes, the down side is that during the postcopy phase, a failure of
+*either* side or the network connection causes the guest to be lost.
+
+In postcopy the destination CPUs are started before all the memory has been
+transferred, and accesses to pages that are yet to be transferred cause
+a fault that's translated by QEMU into a request to the source QEMU.
+
+Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
+doesn't finish in a given time the switch is made to postcopy.
+
+=== Enabling postcopy ===
+
+To enable postcopy (prior to the start of migration):
+
+migrate_set_capability x-postcopy-ram on
+
+The migration will still start in precopy mode, however issuing:
+
+migrate_start_postcopy
+
+will now cause the transition from precopy to postcopy.
+It can be issued immediately after migration is started or any
+time later on.  Issuing it after the end of a migration is harmless.
+
+=== Postcopy device transfer ===
+
+Loading of device data may cause the device emulation to access guest RAM
+that may trigger faults that have to be resolved by the source, as such
+the migration stream has to be able to respond with page data *during* the
+device load, and hence the device data has to be read from the stream completely
+before the device load begins to free the stream up.  This is achieved by
+'packaging' the device data into a blob that's read in one go.
+
+Source behaviour
+
+Until postcopy is entered the migration stream is identical to normal
+precopy, except for the addition of a 'postcopy advise' command at
+the beginning, to tell the destination that postcopy might happen.
+When postcopy starts the source sends the page discard data and then
+forms the 'package' containing:
+
+   Command: 'postcopy listen'
+   The device state
+      A series of sections, identical to the precopy streams device state stream
+      containing everything except postcopiable devices (i.e. RAM)
+   Command: 'postcopy run'
+
+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
+contents are formatted in the same way as the main migration stream.
+
+Destination behaviour
+
+Initially the destination looks the same as precopy, with a single thread
+reading the migration stream; the 'postcopy advise' and 'discard' commands
+are processed to change the way RAM is managed, but don't affect the stream
+processing.
+
+------------------------------------------------------------------------------
+                        1      2   3     4 5                      6   7
+main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
+thread                             |       |
+                                   |     (page request)
+                                   |        \___
+                                   v            \
+listen thread:                     --- page -- page -- page -- page -- page --
+
+                                   a   b        c
+------------------------------------------------------------------------------
+
+On receipt of CMD_PACKAGED (1)
+   All the data associated with the package - the ( ... ) section in the
+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
+recurses into qemu_loadvm_state_main to process the contents of the package (2)
+which contains commands (3,6) and devices (4...)
+
+On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
+a new thread (a) is started that takes over servicing the migration stream,
+while the main thread carries on loading the package.   It loads normal
+background page data (b) but if during a device load a fault happens (5) the
+returned page (c) is loaded by the listen thread allowing the main threads
+device load to carry on.
+
+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
+CPUs start running.
+At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
+and is no longer used by migration, while the listen thread carries
+on servicing page data until the end of migration.
+
+=== Postcopy states ===
+
+Postcopy moves through a series of states (see postcopy_state) from
+ADVISE->LISTEN->RUNNING->END
+
+  Advise: Set at the start of migration if postcopy is enabled, even
+          if it hasn't had the start command; here the destination
+          checks that its OS has the support needed for postcopy, and performs
+          setup to ensure the RAM mappings are suitable for later postcopy.
+          (Triggered by reception of POSTCOPY_ADVISE command)
+
+  Listen: The first command in the package, POSTCOPY_LISTEN, switches
+          the destination state to Listen, and starts a new thread
+          (the 'listen thread') which takes over the job of receiving
+          pages off the migration stream, while the main thread carries
+          on processing the blob.  With this thread able to process page
+          reception, the destination now 'sensitises' the RAM to detect
+          any access to missing pages (on Linux using the 'userfault'
+          system).
+
+  Running: POSTCOPY_RUN causes the destination to synchronise all
+          state and start the CPUs and IO devices running.  The main
+          thread now finishes processing the migration package and
+          now carries on as it would for normal precopy migration
+          (although it can't do the cleanup it would do as it
+          finishes a normal migration).
+
+  End: The listen thread can now quit, and perform the cleanup of migration
+          state, the migration is now complete.
+
+=== Source side page maps ===
+
+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
+and 'sent map'.  The 'migration bitmap' is basically the same as in
+the precopy case, and holds a bit to indicate that page is 'dirty' -
+i.e. needs sending.  During the precopy phase this is updated as the CPU
+dirties pages, however during postcopy the CPUs are stopped and nothing
+should dirty anything any more.
+
+The 'sent map' is used for the transition to postcopy. It is a bitmap that
+has a bit set whenever a page is sent to the destination, however during
+the transition to postcopy mode it is masked against the migration bitmap
+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
+have been previously been sent but are now dirty again.  This masked
+sentmap is sent to the destination which discards those now dirty pages
+before starting the CPUs.
+
+Note that the contents of the sentmap are sacrificed during the calculation
+of the discard set and thus aren't valid once in postcopy.  The dirtymap
+is still valid and is used to ensure that no page is sent more than once.  Any
+request for a page that has already been sent is ignored.  Duplicate requests
+such as this can happen as a page is sent at about the same time the
+destination accesses it.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 02/42] Provide runtime Target page information
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 11:43   ` Juan Quintela
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest Dr. David Alan Gilbert (git)
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The migration code generally is built target-independent, however
there are a few places where knowing the target page size would
avoid artificially moving stuff into migration/ram.c.

Provide 'qemu_target_page_bits()' that returns TARGET_PAGE_BITS
to other bits of code so that they can stay target-independent.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 exec.c                  | 10 ++++++++++
 include/sysemu/sysemu.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/exec.c b/exec.c
index 76bfc4a..81a4481 100644
--- a/exec.c
+++ b/exec.c
@@ -3313,6 +3313,16 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
     }
     return 0;
 }
+
+/*
+ * Allows code that needs to deal with migration bitmaps etc to still be built
+ * target independent.
+ */
+size_t qemu_target_page_bits(void)
+{
+    return TARGET_PAGE_BITS;
+}
+
 #endif
 
 /*
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ef793f7..7e42f76 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -68,6 +68,7 @@ int qemu_reset_requested_get(void);
 void qemu_system_killed(int signal, pid_t pid);
 void qemu_devices_reset(void);
 void qemu_system_reset(bool report);
+size_t qemu_target_page_bits(void);
 
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works Dr. David Alan Gilbert (git)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 02/42] Provide runtime Target page information Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 11:49   ` Juan Quintela
                     ` (2 more replies)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 04/42] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
                   ` (39 subsequent siblings)
  42 siblings, 3 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

One of my patches used a loop that was based on host page size;
it dies in qtest since qtest hadn't bothered init'ing it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 qtest.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qtest.c b/qtest.c
index 05cefd2..8e10340 100644
--- a/qtest.c
+++ b/qtest.c
@@ -657,6 +657,7 @@ void qtest_init(const char *qtest_chrdev, const char *qtest_log, Error **errp)
 
     inbuf = g_string_new("");
     qtest_chr = chr;
+    page_size_init();
 }
 
 bool qtest_driver(void)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 04/42] qemu_ram_block_from_host
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (2 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 11:54   ` Juan Quintela
  2015-07-10  8:36   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
                   ` (38 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock and the global ram_addr_t value.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since its the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 exec.c                    | 54 +++++++++++++++++++++++++++++++++++++++--------
 include/exec/cpu-all.h    |  2 --
 include/exec/cpu-common.h |  3 +++
 include/qemu/typedefs.h   |  1 +
 4 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/exec.c b/exec.c
index 81a4481..d235001 100644
--- a/exec.c
+++ b/exec.c
@@ -1265,6 +1265,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
     return NULL;
 }
 
+const char *qemu_ram_get_idstr(RAMBlock *rb)
+{
+    return rb->idstr;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
 {
@@ -1755,8 +1760,16 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
     }
 }
 
-/* Some of the softmmu routines need to translate from a host pointer
- * (typically a TLB entry) back to a ram offset.
+/*
+ * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
+ * in that RAMBlock.
+ *
+ * ptr: Host pointer to look up
+ * round_offset: If true round the result offset down to a page boundary
+ * *ram_addr: set to result ram_addr
+ * *offset: set to result offset within the RAMBlock
+ *
+ * Returns: RAMBlock (or NULL if not found)
  *
  * By the time this function returns, the returned pointer is not protected
  * by RCU anymore.  If the caller is not within an RCU critical section and
@@ -1764,18 +1777,22 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
  * pointer, such as a reference to the region that includes the incoming
  * ram_addr_t.
  */
-MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr,
+                                   ram_addr_t *offset)
 {
     RAMBlock *block;
     uint8_t *host = ptr;
-    MemoryRegion *mr;
 
     if (xen_enabled()) {
         rcu_read_lock();
         *ram_addr = xen_ram_addr_from_mapcache(ptr);
-        mr = qemu_get_ram_block(*ram_addr)->mr;
+        block = qemu_get_ram_block(*ram_addr);
+        if (block) {
+            *offset = (host - block->host);
+        }
         rcu_read_unlock();
-        return mr;
+        return block;
     }
 
     rcu_read_lock();
@@ -1798,10 +1815,29 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
     return NULL;
 
 found:
-    *ram_addr = block->offset + (host - block->host);
-    mr = block->mr;
+    *offset = (host - block->host);
+    if (round_offset) {
+        *offset &= TARGET_PAGE_MASK;
+    }
+    *ram_addr = block->offset + *offset;
     rcu_read_unlock();
-    return mr;
+    return block;
+}
+
+/* Some of the softmmu routines need to translate from a host pointer
+   (typically a TLB entry) back to a ram offset.  */
+MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
+{
+    RAMBlock *block;
+    ram_addr_t offset; /* Not used */
+
+    block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset);
+
+    if (!block) {
+        return NULL;
+    }
+
+    return block->mr;
 }
 
 static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ac06c67..1f336e6 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -266,8 +266,6 @@ CPUArchState *cpu_copy(CPUArchState *env);
 
 /* memory API */
 
-typedef struct RAMBlock RAMBlock;
-
 struct RAMBlock {
     struct rcu_head rcu;
     struct MemoryRegion *mr;
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index de8a720..f1a045b 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -62,8 +62,11 @@ typedef uint32_t CPUReadMemoryFunc(void *opaque, hwaddr addr);
 void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
 /* This should not be used by devices.  */
 MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
+                                   ram_addr_t *ram_addr, ram_addr_t *offset);
 void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(ram_addr_t addr);
+const char *qemu_ram_get_idstr(RAMBlock *rb);
 
 void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
                             int len, int is_write);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 6fdcbcd..fc7c70e 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -70,6 +70,7 @@ typedef struct QEMUSGList QEMUSGList;
 typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QEMUTimer QEMUTimer;
+typedef struct RAMBlock RAMBlock;
 typedef struct Range Range;
 typedef struct SerialState SerialState;
 typedef struct SHPCDevice SHPCDevice;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (3 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 04/42] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 11:57   ` Juan Quintela
  2015-07-13  9:08   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 06/42] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
                   ` (37 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

qemu_get_buffer always copies the data it reads to a users buffer,
however in many cases the file buffer inside qemu_file could be given
back to the caller, avoiding the copy.  This isn't always possible
depending on the size and alignment of the data.

Thus 'qemu_get_buffer_less_copy' either copies the data to a supplied
buffer or updates a pointer to the internal buffer if convenient.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  2 ++
 migration/qemu-file.c         | 47 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 4f67d79..29a9d69 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -162,6 +162,8 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 ssize_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size,
                                   int level);
 int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src);
+int qemu_get_buffer_less_copy(QEMUFile *f, uint8_t **buf, int size);
+
 /*
  * Note that you can only peek continuous bytes from where the current pointer
  * is; you aren't guaranteed to be able to peak to +n bytes unless you've
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 965a757..c111a6b 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -427,6 +427,53 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 }
 
 /*
+ * Read 'size' bytes of data from the file.
+ * 'size' can be larger than the internal buffer.
+ *
+ * The data:
+ *   may be held on an internal buffer (in which case *buf is updated
+ *     to point to it) that is valid until the next qemu_file operation.
+ * OR
+ *   will be copied to the *buf that was passed in.
+ *
+ * The code tries to avoid the copy if possible.
+ *
+ * It will return size bytes unless there was an error, in which case it will
+ * return as many as it managed to read (assuming blocking fd's which
+ * all current QEMUFile are)
+ *
+ * Note: Since **buf may get changed, the caller should take care to
+ *       keep a pointer to the original buffer if it needs to deallocate it.
+ */
+int qemu_get_buffer_less_copy(QEMUFile *f, uint8_t **buf, int size)
+{
+    int pending = size;
+    int done = 0;
+    bool first = true;
+
+    while (pending > 0) {
+        int res;
+        uint8_t *src;
+
+        res = qemu_peek_buffer(f, &src, MIN(pending, IO_BUF_SIZE), 0);
+        if (res == 0) {
+            return done;
+        }
+        qemu_file_skip(f, res);
+        done += res;
+        pending -= res;
+        if (first && res == size) {
+            *buf = src;
+            break;
+        }
+        first = false;
+        memcpy(buf, src, res);
+        buf += res;
+    }
+    return done;
+}
+
+/*
  * Peeks a single byte from the buffer; this isn't guaranteed to work if
  * offset leaves a gap after the previous read/peeked data.
  */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 06/42] Add wrapper for setting blocking status on a QEMUFile
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (4 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 11:59   ` Juan Quintela
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
                   ` (36 subsequent siblings)
  42 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a wrapper to change the blocking status on a QEMUFile
rather than having to use qemu_set_block(qemu_get_fd(f));
it seems best to avoid exposing the fd since not all QEMUFile's
really have one.  With this wrapper we could move the implementation
down to be different on different transports.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 include/migration/qemu-file.h |  1 +
 migration/qemu-file.c         | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 29a9d69..d43c835 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -193,6 +193,7 @@ int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
+void qemu_file_change_blocking(QEMUFile *f, bool block);
 
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index c111a6b..c746129 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -651,3 +651,18 @@ size_t qemu_get_counted_string(QEMUFile *f, char buf[256])
 
     return res == len ? res : 0;
 }
+
+/*
+ * Change the blocking state of the QEMUFile.
+ * Note: On some transports the OS only keeps a single blocking state for
+ *       both directions, and thus changing the blocking on the main
+ *       QEMUFile can also affect the return path.
+ */
+void qemu_file_change_blocking(QEMUFile *f, bool block)
+{
+    if (block) {
+        qemu_set_block(qemu_get_fd(f));
+    } else {
+        qemu_set_nonblock(qemu_get_fd(f));
+    }
+}
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (5 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 06/42] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 12:17   ` Juan Quintela
  2015-07-13  9:12   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 08/42] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
                   ` (35 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Useful for debugging the migration bitmap and other bitmaps
of the same format (including the sentmap in postcopy).

The bitmap is printed to stderr.
Lines that are all the expected value are excluded so the output
can be quite compact for many bitmaps.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 migration/ram.c               | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 9387c8c..b3a7f75 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -144,6 +144,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
 double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/migration/ram.c b/migration/ram.c
index 57368e1..efc215a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1051,6 +1051,44 @@ static void reset_ram_globals(void)
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
 
+/*
+ * 'expected' is the value you expect the bitmap mostly to be full
+ * of; it won't bother printing lines that are all this value.
+ * If 'todump' is null the migration bitmap is dumped.
+ */
+void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
+{
+    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    int64_t cur;
+    int64_t linelen = 128;
+    char linebuf[129];
+
+    if (!todump) {
+        todump = migration_bitmap;
+    }
+
+    for (cur = 0; cur < ram_pages; cur += linelen) {
+        int64_t curb;
+        bool found = false;
+        /*
+         * Last line; catch the case where the line length
+         * is longer than remaining ram
+         */
+        if (cur + linelen > ram_pages) {
+            linelen = ram_pages - cur;
+        }
+        for (curb = 0; curb < linelen; curb++) {
+            bool thisbit = test_bit(cur + curb, todump);
+            linebuf[curb] = thisbit ? '1' : '.';
+            found = found || (thisbit != expected);
+        }
+        if (found) {
+            linebuf[curb] = '\0';
+            fprintf(stderr,  "0x%08" PRIx64 " : %s\n", cur, linebuf);
+        }
+    }
+}
 
 /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 08/42] migrate_init: Call from savevm
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (6 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 12:18   ` Juan Quintela
  2015-07-13  9:13   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 09/42] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
                   ` (34 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h | 4 +---
 include/qemu/typedefs.h       | 1 +
 migration/migration.c         | 2 +-
 migration/savevm.c            | 2 ++
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index b3a7f75..414c5cf 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -41,10 +41,7 @@ struct MigrationParams {
     bool shared;
 };
 
-typedef struct MigrationState MigrationState;
-
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
-
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
@@ -115,6 +112,7 @@ int migrate_fd_close(MigrationState *s);
 
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
+MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index fc7c70e..8403856 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -41,6 +41,7 @@ typedef struct MemoryRegion MemoryRegion;
 typedef struct MemoryRegionSection MemoryRegionSection;
 typedef struct MigrationIncomingState MigrationIncomingState;
 typedef struct MigrationParams MigrationParams;
+typedef struct MigrationState MigrationState;
 typedef struct Monitor Monitor;
 typedef struct MouseTransformInfo MouseTransformInfo;
 typedef struct MSIMessage MSIMessage;
diff --git a/migration/migration.c b/migration/migration.c
index b04b457..3cd7f4b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -489,7 +489,7 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIGRATION_STATUS_FAILED);
 }
 
-static MigrationState *migrate_init(const MigrationParams *params)
+MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
     int64_t bandwidth_limit = s->bandwidth_limit;
diff --git a/migration/savevm.c b/migration/savevm.c
index 2091882..aaf8c5c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -903,6 +903,8 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
         .blk = 0,
         .shared = 0
     };
+    MigrationState *ms = migrate_init(&params);
+    ms->file = f;
 
     if (qemu_savevm_state_blocked(errp)) {
         return -EINVAL;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 09/42] Rename save_live_complete to save_live_complete_precopy
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (7 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 08/42] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 12:20   ` Juan Quintela
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
                   ` (33 subsequent siblings)
  42 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy we're going to need to perform the complete phase
for postcopiable devices at a different point, start out by
renaming all of the 'complete's to make the difference obvious.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
---
 hw/ppc/spapr.c              |  2 +-
 include/migration/vmstate.h |  2 +-
 include/sysemu/sysemu.h     |  2 +-
 migration/block.c           |  2 +-
 migration/migration.c       |  2 +-
 migration/ram.c             |  2 +-
 migration/savevm.c          | 10 +++++-----
 trace-events                |  2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f174e5a..2f8155d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1375,7 +1375,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_htab_handlers = {
     .save_live_setup = htab_save_setup,
     .save_live_iterate = htab_save_iterate,
-    .save_live_complete = htab_save_complete,
+    .save_live_complete_precopy = htab_save_complete,
     .load_state = htab_load,
 };
 
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 7153b1e..074747c 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -40,7 +40,7 @@ typedef struct SaveVMHandlers {
     SaveStateHandler *save_state;
 
     void (*cancel)(void *opaque);
-    int (*save_live_complete)(QEMUFile *f, void *opaque);
+    int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
 
     /* This runs both outside and inside the iothread lock.  */
     bool (*is_active)(void *opaque);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 7e42f76..6dae2db 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,7 +87,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
-void qemu_savevm_state_complete(QEMUFile *f);
+void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 int qemu_loadvm_state(QEMUFile *f);
diff --git a/migration/block.c b/migration/block.c
index ddb59cc..3005668 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -877,7 +877,7 @@ static SaveVMHandlers savevm_block_handlers = {
     .set_params = block_set_params,
     .save_live_setup = block_save_setup,
     .save_live_iterate = block_save_iterate,
-    .save_live_complete = block_save_complete,
+    .save_live_complete_precopy = block_save_complete,
     .save_live_pending = block_save_pending,
     .load_state = block_load,
     .cancel = block_migration_cancel,
diff --git a/migration/migration.c b/migration/migration.c
index 3cd7f4b..295f15a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -794,7 +794,7 @@ static void *migration_thread(void *opaque)
                 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
                 if (ret >= 0) {
                     qemu_file_set_rate_limit(s->file, INT64_MAX);
-                    qemu_savevm_state_complete(s->file);
+                    qemu_savevm_state_complete_precopy(s->file);
                 }
                 qemu_mutex_unlock_iothread();
 
diff --git a/migration/ram.c b/migration/ram.c
index efc215a..492ed8a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1604,7 +1604,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
-    .save_live_complete = ram_save_complete,
+    .save_live_complete_precopy = ram_save_complete,
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
     .cancel = ram_migration_cancel,
diff --git a/migration/savevm.c b/migration/savevm.c
index aaf8c5c..f9168ac 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -793,19 +793,19 @@ static bool should_send_vmdesc(void)
     return !machine->suppress_vmdesc;
 }
 
-void qemu_savevm_state_complete(QEMUFile *f)
+void qemu_savevm_state_complete_precopy(QEMUFile *f)
 {
     QJSON *vmdesc;
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
 
-    trace_savevm_state_complete();
+    trace_savevm_state_complete_precopy();
 
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete) {
+        if (!se->ops || !se->ops->save_live_complete_precopy) {
             continue;
         }
         if (se->ops && se->ops->is_active) {
@@ -817,7 +817,7 @@ void qemu_savevm_state_complete(QEMUFile *f)
 
         save_section_header(f, se, QEMU_VM_SECTION_END);
 
-        ret = se->ops->save_live_complete(f, se->opaque);
+        ret = se->ops->save_live_complete_precopy(f, se->opaque);
         trace_savevm_section_end(se->idstr, se->section_id, ret);
         save_section_footer(f, se);
         if (ret < 0) {
@@ -923,7 +923,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 
     ret = qemu_file_get_error(f);
     if (ret == 0) {
-        qemu_savevm_state_complete(f);
+        qemu_savevm_state_complete_precopy(f);
         ret = qemu_file_get_error(f);
     }
     if (ret != 0) {
diff --git a/trace-events b/trace-events
index 1abca7a..d539528 100644
--- a/trace-events
+++ b/trace-events
@@ -1188,7 +1188,7 @@ savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, sectio
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-savevm_state_complete(void) ""
+savevm_state_complete_precopy(void) ""
 savevm_state_cancel(void) ""
 vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (8 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 09/42] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 12:23   ` Juan Quintela
  2015-07-13 10:12   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
                   ` (32 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/qemu-file.h |  7 +++++
 migration/qemu-file-unix.c    | 69 +++++++++++++++++++++++++++++++++++++------
 migration/qemu-file.c         | 12 ++++++++
 3 files changed, 79 insertions(+), 9 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index d43c835..7721c42 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -85,6 +85,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
                                uint64_t *bytes_sent);
 
 /*
+ * Return a QEMUFile for comms in the opposite direction
+ */
+typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
+
+/*
  * Stop any read or write (depending on flags) on the underlying
  * transport on the QEMUFile.
  * Existing blocking reads/writes must be woken
@@ -102,6 +107,7 @@ typedef struct QEMUFileOps {
     QEMURamHookFunc *after_ram_iterate;
     QEMURamHookFunc *hook_ram_load;
     QEMURamSaveFunc *save_page;
+    QEMURetPathFunc *get_return_path;
     QEMUFileShutdownFunc *shut_down;
 } QEMUFileOps;
 
@@ -192,6 +198,7 @@ int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
 int qemu_file_shutdown(QEMUFile *f);
+QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_change_blocking(QEMUFile *f, bool block);
 
diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index bfbc086..561621e 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -96,6 +96,56 @@ static int socket_shutdown(void *opaque, bool rd, bool wr)
     }
 }
 
+static int socket_return_close(void *opaque)
+{
+    QEMUFileSocket *s = opaque;
+    /*
+     * Note: We don't close the socket, that should be done by the forward
+     * path.
+     */
+    g_free(s);
+    return 0;
+}
+
+static const QEMUFileOps socket_return_read_ops = {
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_return_close,
+    .shut_down       = socket_shutdown,
+};
+
+static const QEMUFileOps socket_return_write_ops = {
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_return_close,
+    .shut_down       = socket_shutdown,
+};
+
+/*
+ * Give a QEMUFile* off the same socket but data in the opposite
+ * direction.
+ */
+static QEMUFile *socket_dup_return_path(void *opaque)
+{
+    QEMUFileSocket *forward = opaque;
+    QEMUFileSocket *reverse;
+
+    if (qemu_file_get_error(forward->file)) {
+        /* If the forward file is in error, don't try and open a return */
+        return NULL;
+    }
+
+    reverse = g_malloc0(sizeof(QEMUFileSocket));
+    reverse->fd = forward->fd;
+    /* I don't think there's a better way to tell which direction 'this' is */
+    if (forward->file->ops->get_buffer != NULL) {
+        /* being called from the read side, so we need to be able to write */
+        return qemu_fopen_ops(reverse, &socket_return_write_ops);
+    } else {
+        return qemu_fopen_ops(reverse, &socket_return_read_ops);
+    }
+}
+
 static ssize_t unix_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
                                   int64_t pos)
 {
@@ -204,18 +254,19 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }
 
 static const QEMUFileOps socket_read_ops = {
-    .get_fd     = socket_get_fd,
-    .get_buffer = socket_get_buffer,
-    .close      = socket_close,
-    .shut_down  = socket_shutdown
-
+    .get_fd          = socket_get_fd,
+    .get_buffer      = socket_get_buffer,
+    .close           = socket_close,
+    .shut_down       = socket_shutdown,
+    .get_return_path = socket_dup_return_path
 };
 
 static const QEMUFileOps socket_write_ops = {
-    .get_fd        = socket_get_fd,
-    .writev_buffer = socket_writev_buffer,
-    .close         = socket_close,
-    .shut_down     = socket_shutdown
+    .get_fd          = socket_get_fd,
+    .writev_buffer   = socket_writev_buffer,
+    .close           = socket_close,
+    .shut_down       = socket_shutdown,
+    .get_return_path = socket_dup_return_path
 };
 
 QEMUFile *qemu_fopen_socket(int fd, const char *mode)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index c746129..7d9d983 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -43,6 +43,18 @@ int qemu_file_shutdown(QEMUFile *f)
     return f->ops->shut_down(f->opaque, true, true);
 }
 
+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ *         NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+    if (!f->ops->get_return_path) {
+        return NULL;
+    }
+    return f->ops->get_return_path(f->opaque);
+}
+
 bool qemu_file_mode_is_not_valid(const char *mode)
 {
     if (mode == NULL ||
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (9 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 12:28   ` Juan Quintela
  2015-07-13 12:37   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 12/42] Migration commands Dr. David Alan Gilbert (git)
                   ` (31 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The destination sets the fd to non-blocking on incoming migrations;
this also affects the return path from the destination, and thus we
need to make sure we can safely write to the return path.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/qemu-file-unix.c | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
index 561621e..b6c55ab 100644
--- a/migration/qemu-file-unix.c
+++ b/migration/qemu-file-unix.c
@@ -39,12 +39,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
     QEMUFileSocket *s = opaque;
     ssize_t len;
     ssize_t size = iov_size(iov, iovcnt);
+    ssize_t offset = 0;
+    int     err;
 
-    len = iov_send(s->fd, iov, iovcnt, 0, size);
-    if (len < size) {
-        len = -socket_error();
-    }
-    return len;
+    while (size > 0) {
+        len = iov_send(s->fd, iov, iovcnt, offset, size);
+
+        if (len > 0) {
+            size -= len;
+            offset += len;
+        }
+
+        if (size > 0) {
+            err = socket_error();
+
+            if (err != EAGAIN && err != EWOULDBLOCK) {
+                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
+                             err, size, len);
+                /*
+                 * If I've already sent some but only just got the error, I
+                 * could return the amount validly sent so far and wait for the
+                 * next call to report the error, but I'd rather flag the error
+                 * immediately.
+                 */
+                return -err;
+            }
+
+            /* Emulate blocking */
+            GPollFD pfd;
+
+            pfd.fd = s->fd;
+            pfd.events = G_IO_OUT | G_IO_ERR;
+            pfd.revents = 0;
+            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
+        }
+     }
+
+    return offset;
 }
 
 static int socket_get_fd(void *opaque)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 12/42] Migration commands
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (10 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 12:31   ` Juan Quintela
  2015-07-13 12:45   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 13/42] Return path: Control commands Dr. David Alan Gilbert (git)
                   ` (30 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Create QEMU_VM_COMMAND section type for sending commands from
source to destination.  These commands are not intended to convey
guest state but to control the migration process.

For use in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 include/sysemu/sysemu.h       |  7 +++++++
 migration/savevm.c            | 46 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  1 +
 4 files changed, 55 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 414c5cf..8adaa45 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -34,6 +34,7 @@
 #define QEMU_VM_SECTION_FULL         0x04
 #define QEMU_VM_SUBSECTION           0x05
 #define QEMU_VM_VMDESCRIPTION        0x06
+#define QEMU_VM_COMMAND              0x07
 #define QEMU_VM_SECTION_FOOTER       0x7e
 
 struct MigrationParams {
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 6dae2db..5869607 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -82,6 +82,11 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
 
 void qemu_announce_self(void);
 
+/* Subcommands for QEMU_VM_COMMAND */
+enum qemu_vm_cmd {
+    MIG_CMD_INVALID = 0,   /* Must be 0 */
+};
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -90,6 +95,8 @@ int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
+                              uint16_t len, uint8_t *data);
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/migration/savevm.c b/migration/savevm.c
index f9168ac..7ce9d21 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -686,6 +686,23 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
     return true;
 }
 
+/* Send a 'QEMU_VM_COMMAND' type element with the command
+ * and associated data.
+ */
+void qemu_savevm_command_send(QEMUFile *f,
+                              enum qemu_vm_cmd command,
+                              uint16_t len,
+                              uint8_t *data)
+{
+    qemu_put_byte(f, QEMU_VM_COMMAND);
+    qemu_put_be16(f, (uint16_t)command);
+    qemu_put_be16(f, len);
+    if (len) {
+        qemu_put_buffer(f, data, len);
+    }
+    qemu_fflush(f);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -982,6 +999,29 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+/*
+ * Process an incoming 'QEMU_VM_COMMAND'
+ * negative return on error (will issue error message)
+ */
+static int loadvm_process_command(QEMUFile *f)
+{
+    uint16_t cmd;
+    uint16_t len;
+
+    cmd = qemu_get_be16(f);
+    len = qemu_get_be16(f);
+
+    trace_loadvm_process_command(cmd, len);
+    switch (cmd) {
+
+    default:
+        error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
+        return -1;
+    }
+
+    return 0;
+}
+
 struct LoadStateEntry {
     QLIST_ENTRY(LoadStateEntry) entry;
     SaveStateEntry *se;
@@ -1114,6 +1154,12 @@ int qemu_loadvm_state(QEMUFile *f)
                 goto out;
             }
             break;
+        case QEMU_VM_COMMAND:
+            ret = loadvm_process_command(f);
+            if (ret < 0) {
+                goto out;
+            }
+            break;
         default:
             error_report("Unknown savevm section type %d", section_type);
             ret = -EINVAL;
diff --git a/trace-events b/trace-events
index d539528..73a65c3 100644
--- a/trace-events
+++ b/trace-events
@@ -1183,6 +1183,7 @@ virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64
 qemu_loadvm_state_section(unsigned int section_type) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
 savevm_state_begin(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 13/42] Return path: Control commands
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (11 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 12/42] Migration commands Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 12:49   ` Juan Quintela
  2015-07-13 12:55   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
                   ` (29 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add two src->dest commands:
   * OPEN_RETURN_PATH - To request that the destination open the return path
   * PING - Request an acknowledge from the destination

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h |  2 ++
 include/sysemu/sysemu.h       |  6 ++++-
 migration/savevm.c            | 60 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  2 ++
 4 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8adaa45..65fe5db 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -47,6 +47,8 @@ typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 struct MigrationIncomingState {
     QEMUFile *file;
 
+    QEMUFile *return_path;
+
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
 };
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 5869607..d8875ca 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -84,7 +84,9 @@ void qemu_announce_self(void);
 
 /* Subcommands for QEMU_VM_COMMAND */
 enum qemu_vm_cmd {
-    MIG_CMD_INVALID = 0,   /* Must be 0 */
+    MIG_CMD_INVALID = 0,       /* Must be 0 */
+    MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
+    MIG_CMD_PING,              /* Request a PONG on the RP */
 };
 
 bool qemu_savevm_state_blocked(Error **errp);
@@ -97,6 +99,8 @@ void qemu_savevm_state_cancel(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
+void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
+void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/migration/savevm.c b/migration/savevm.c
index 7ce9d21..a995014 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -703,6 +703,20 @@ void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
+void qemu_savevm_send_ping(QEMUFile *f, uint32_t value)
+{
+    uint32_t buf;
+
+    trace_savevm_send_ping(value);
+    buf = cpu_to_be32(value);
+    qemu_savevm_command_send(f, MIG_CMD_PING, sizeof(value), (uint8_t *)&buf);
+}
+
+void qemu_savevm_send_open_return_path(QEMUFile *f)
+{
+    qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -999,20 +1013,66 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+static int loadvm_process_command_simple_lencheck(const char *name,
+                                                  unsigned int actual,
+                                                  unsigned int expected)
+{
+    if (actual != expected) {
+        error_report("%s received with bad length - expecting %d, got %d",
+                     name, expected, actual);
+        return -1;
+    }
+
+    return 0;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * negative return on error (will issue error message)
  */
 static int loadvm_process_command(QEMUFile *f)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     uint16_t cmd;
     uint16_t len;
+    uint32_t tmp32;
 
     cmd = qemu_get_be16(f);
     len = qemu_get_be16(f);
 
     trace_loadvm_process_command(cmd, len);
     switch (cmd) {
+    case MIG_CMD_OPEN_RETURN_PATH:
+        if (loadvm_process_command_simple_lencheck("CMD_OPEN_RETURN_PATH",
+                                                   len, 0)) {
+            return -1;
+        }
+        if (mis->return_path) {
+            error_report("CMD_OPEN_RETURN_PATH called when RP already open");
+            /* Not really a problem, so don't give up */
+            return 0;
+        }
+        mis->return_path = qemu_file_get_return_path(f);
+        if (!mis->return_path) {
+            error_report("CMD_OPEN_RETURN_PATH failed");
+            return -1;
+        }
+        break;
+
+    case MIG_CMD_PING:
+        if (loadvm_process_command_simple_lencheck("CMD_PING", len,
+                                                   sizeof(tmp32))) {
+            return -1;
+        }
+        tmp32 = qemu_get_be32(f);
+        trace_loadvm_process_command_ping(tmp32);
+        if (!mis->return_path) {
+            error_report("CMD_PING (0x%x) received with no return path",
+                         tmp32);
+            return -1;
+        }
+        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
+        break;
 
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
diff --git a/trace-events b/trace-events
index 73a65c3..5967fdf 100644
--- a/trace-events
+++ b/trace-events
@@ -1184,8 +1184,10 @@ qemu_loadvm_state_section(unsigned int section_type) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
+loadvm_process_command_ping(uint32_t val) "%x"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
+savevm_send_ping(uint32_t val) "%x"
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (12 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 13/42] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-17 16:30   ` Juan Quintela
  2015-07-15  7:31   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path Dr. David Alan Gilbert (git)
                   ` (28 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add migrate_send_rp_message to send a message from destination to source along the return path.
  (It uses a mutex to let it be called from multiple threads)
Add migrate_send_rp_shut to send a 'shut' message to indicate
  the destination is finished with the RP.
Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
  Use it in the MSG_RP_PING handler

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 17 ++++++++++++++++
 migration/migration.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c            |  2 +-
 trace-events                  |  1 +
 4 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 65fe5db..36caab9 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -42,12 +42,20 @@ struct MigrationParams {
     bool shared;
 };
 
+/* Messages sent on the return path from destination to source */
+enum mig_rp_message_type {
+    MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
+    MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
+    MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
+};
+
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
 
     QEMUFile *return_path;
+    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
@@ -179,6 +187,15 @@ int migrate_compress_level(void);
 int migrate_compress_threads(void);
 int migrate_decompress_threads(void);
 
+/* Sending on the return path - generic and then for each message type */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rp_message_type message_type,
+                             uint16_t len, void *data);
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value);
+void migrate_send_rp_pong(MigrationIncomingState *mis,
+                          uint32_t value);
+
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_load_hook(QEMUFile *f, uint64_t flags);
diff --git a/migration/migration.c b/migration/migration.c
index 295f15a..afb19a1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -85,6 +85,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
     mis_current = g_malloc0(sizeof(MigrationIncomingState));
     mis_current->file = f;
     QLIST_INIT(&mis_current->loadvm_handlers);
+    qemu_mutex_init(&mis_current->rp_mutex);
 
     return mis_current;
 }
@@ -182,6 +183,50 @@ void process_incoming_migration(QEMUFile *f)
     qemu_coroutine_enter(co, f);
 }
 
+/*
+ * Send a message on the return channel back to the source
+ * of the migration.
+ */
+void migrate_send_rp_message(MigrationIncomingState *mis,
+                             enum mig_rp_message_type message_type,
+                             uint16_t len, void *data)
+{
+    trace_migrate_send_rp_message((int)message_type, len);
+    qemu_mutex_lock(&mis->rp_mutex);
+    qemu_put_be16(mis->return_path, (unsigned int)message_type);
+    qemu_put_be16(mis->return_path, len);
+    qemu_put_buffer(mis->return_path, data, len);
+    qemu_fflush(mis->return_path);
+    qemu_mutex_unlock(&mis->rp_mutex);
+}
+
+/*
+ * Send a 'SHUT' message on the return channel with the given value
+ * to indicate that we've finished with the RP.  Non-0 value indicates
+ * error.
+ */
+void migrate_send_rp_shut(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_SHUT, sizeof(buf), &buf);
+}
+
+/*
+ * Send a 'PONG' message on the return channel with the given value
+ * (normally in response to a 'PING')
+ */
+void migrate_send_rp_pong(MigrationIncomingState *mis,
+                          uint32_t value)
+{
+    uint32_t buf;
+
+    buf = cpu_to_be32(value);
+    migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
+}
+
 /* amount of nanoseconds we are willing to wait for migration to be down.
  * the choice of nanoseconds is because it is the maximum resolution that
  * get_clock() can achieve. It is an internal measure. All user-visible
diff --git a/migration/savevm.c b/migration/savevm.c
index a995014..d424c2a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1071,7 +1071,7 @@ static int loadvm_process_command(QEMUFile *f)
                          tmp32);
             return -1;
         }
-        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
+        migrate_send_rp_pong(mis, tmp32);
         break;
 
     default:
diff --git a/trace-events b/trace-events
index 5967fdf..5738e3f 100644
--- a/trace-events
+++ b/trace-events
@@ -1399,6 +1399,7 @@ migrate_fd_cleanup(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
+migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
 
 # migration/rdma.c
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (13 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 10:29   ` Juan Quintela
                     ` (2 more replies)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 16/42] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
                   ` (27 subsequent siblings)
  42 siblings, 3 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a return path, and handle messages that are received upon it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   8 ++
 migration/migration.c         | 177 +++++++++++++++++++++++++++++++++++++++++-
 trace-events                  |  12 +++
 3 files changed, 196 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 36caab9..868f59a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -77,6 +77,14 @@ struct MigrationState
 
     int state;
     MigrationParams params;
+
+    /* State related to return path */
+    struct {
+        QEMUFile     *file;
+        QemuThread    rp_thread;
+        bool          error;
+    } rp_state;
+
     double mbps;
     int64_t total_time;
     int64_t downtime;
diff --git a/migration/migration.c b/migration/migration.c
index afb19a1..fb2f491 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -278,6 +278,23 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     return params;
 }
 
+/*
+ * Return true if we're already in the middle of a migration
+ * (i.e. any of the active or setup states)
+ */
+static bool migration_already_active(MigrationState *ms)
+{
+    switch (ms->state) {
+    case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_SETUP:
+        return true;
+
+    default:
+        return false;
+
+    }
+}
+
 static void get_xbzrle_cache_stats(MigrationInfo *info)
 {
     if (migrate_use_xbzrle()) {
@@ -441,6 +458,21 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
     }
 }
 
+static void migrate_fd_cleanup_src_rp(MigrationState *ms)
+{
+    QEMUFile *rp = ms->rp_state.file;
+
+    /*
+     * When stuff goes wrong (e.g. failing destination) on the rp, it can get
+     * cleaned up from a few threads; make sure not to do it twice in parallel
+     */
+    rp = atomic_cmpxchg(&ms->rp_state.file, rp, NULL);
+    if (rp) {
+        trace_migrate_fd_cleanup_src_rp();
+        qemu_fclose(rp);
+    }
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
     MigrationState *s = opaque;
@@ -448,6 +480,8 @@ static void migrate_fd_cleanup(void *opaque)
     qemu_bh_delete(s->cleanup_bh);
     s->cleanup_bh = NULL;
 
+    migrate_fd_cleanup_src_rp(s);
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -487,6 +521,11 @@ static void migrate_fd_cancel(MigrationState *s)
     QEMUFile *f = migrate_get_current()->file;
     trace_migrate_fd_cancel();
 
+    if (s->rp_state.file) {
+        /* shutdown the rp socket, so causing the rp thread to shutdown */
+        qemu_file_shutdown(s->rp_state.file);
+    }
+
     do {
         old_state = s->state;
         if (old_state != MIGRATION_STATUS_SETUP &&
@@ -801,8 +840,144 @@ int64_t migrate_xbzrle_cache_size(void)
     return s->xbzrle_cache_size;
 }
 
-/* migration thread support */
+/*
+ * Something bad happened to the RP stream, mark an error
+ * The caller shall print something to indicate why
+ */
+static void source_return_path_bad(MigrationState *s)
+{
+    s->rp_state.error = true;
+    migrate_fd_cleanup_src_rp(s);
+}
+
+/*
+ * Handles messages sent on the return path towards the source VM
+ *
+ */
+static void *source_return_path_thread(void *opaque)
+{
+    MigrationState *ms = opaque;
+    QEMUFile *rp = ms->rp_state.file;
+    uint16_t expected_len, header_len, header_type;
+    const int max_len = 512;
+    uint8_t buf[max_len];
+    uint32_t tmp32;
+    int res;
+
+    trace_source_return_path_thread_entry();
+    while (rp && !qemu_file_get_error(rp) &&
+        migration_already_active(ms)) {
+        trace_source_return_path_thread_loop_top();
+        header_type = qemu_get_be16(rp);
+        header_len = qemu_get_be16(rp);
+
+        switch (header_type) {
+        case MIG_RP_MSG_SHUT:
+        case MIG_RP_MSG_PONG:
+            expected_len = 4;
+            break;
+
+        default:
+            error_report("RP: Received invalid message 0x%04x length 0x%04x",
+                    header_type, header_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
 
+        if (header_len > expected_len) {
+            error_report("RP: Received message 0x%04x with"
+                    "incorrect length %d expecting %d",
+                    header_type, header_len,
+                    expected_len);
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* We know we've got a valid header by this point */
+        res = qemu_get_buffer(rp, buf, header_len);
+        if (res != header_len) {
+            trace_source_return_path_thread_failed_read_cmd_data();
+            source_return_path_bad(ms);
+            goto out;
+        }
+
+        /* OK, we have the message and the data */
+        switch (header_type) {
+        case MIG_RP_MSG_SHUT:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            trace_source_return_path_thread_shut(tmp32);
+            if (tmp32) {
+                error_report("RP: Sibling indicated error %d", tmp32);
+                source_return_path_bad(ms);
+            }
+            /*
+             * We'll let the main thread deal with closing the RP
+             * we could do a shutdown(2) on it, but we're the only user
+             * anyway, so there's nothing gained.
+             */
+            goto out;
+
+        case MIG_RP_MSG_PONG:
+            tmp32 = be32_to_cpup((uint32_t *)buf);
+            trace_source_return_path_thread_pong(tmp32);
+            break;
+
+        default:
+            break;
+        }
+    }
+    if (rp && qemu_file_get_error(rp)) {
+        trace_source_return_path_thread_bad_end();
+        source_return_path_bad(ms);
+    }
+
+    trace_source_return_path_thread_end();
+out:
+    return NULL;
+}
+
+__attribute__ (( unused )) /* Until later in patch series */
+static int open_return_path_on_source(MigrationState *ms)
+{
+
+    ms->rp_state.file = qemu_file_get_return_path(ms->file);
+    if (!ms->rp_state.file) {
+        return -1;
+    }
+
+    trace_open_return_path_on_source();
+    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
+                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
+
+    trace_open_return_path_on_source_continue();
+
+    return 0;
+}
+
+__attribute__ (( unused )) /* Until later in patch series */
+/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
+static int await_return_path_close_on_source(MigrationState *ms)
+{
+    /*
+     * If this is a normal exit then the destination will send a SHUT and the
+     * rp_thread will exit, however if there's an error we need to cause
+     * it to exit, which we can do by a shutdown.
+     * (canceling must also shutdown to stop us getting stuck here if
+     * the destination died at just the wrong place)
+     */
+    if (qemu_file_get_error(ms->file) && ms->rp_state.file) {
+        qemu_file_shutdown(ms->rp_state.file);
+    }
+    trace_await_return_path_close_on_source_joining();
+    qemu_thread_join(&ms->rp_state.rp_thread);
+    trace_await_return_path_close_on_source_close();
+    return ms->rp_state.error;
+}
+
+/*
+ * Master migration thread on the source VM.
+ * It drives the migration and pumps the data down the outgoing channel.
+ */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
diff --git a/trace-events b/trace-events
index 5738e3f..282cde1 100644
--- a/trace-events
+++ b/trace-events
@@ -1394,12 +1394,24 @@ flic_no_device_api(int err) "flic: no Device Contral API support %d"
 flic_reset_failed(int err) "flic: reset failed %d"
 
 # migration.c
+await_return_path_close_on_source_close(void) ""
+await_return_path_close_on_source_joining(void) ""
 migrate_set_state(int new_state) "new state %d"
 migrate_fd_cleanup(void) ""
+migrate_fd_cleanup_src_rp(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+open_return_path_on_source(void) ""
+open_return_path_on_source_continue(void) ""
+source_return_path_thread_bad_end(void) ""
+source_return_path_thread_end(void) ""
+source_return_path_thread_entry(void) ""
+source_return_path_thread_failed_read_cmd_data(void) ""
+source_return_path_thread_loop_top(void) ""
+source_return_path_thread_pong(uint32_t val) "%x"
+source_return_path_thread_shut(uint32_t val) "%x"
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
 
 # migration/rdma.c
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 16/42] Rework loadvm path for subloops
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (14 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 10:33   ` Juan Quintela
  2015-07-15  9:34   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
                   ` (26 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.

Split the core of qemu_loadvm_state out so we can use it for both.

Allow the inner loadvm loop to quit and cause the parent loops to
exit as well.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   6 ++
 migration/migration.c         |   2 +
 migration/savevm.c            | 131 +++++++++++++++++++++++-------------------
 trace-events                  |   4 ++
 4 files changed, 83 insertions(+), 60 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 868f59a..1bf78f6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -54,6 +54,12 @@ typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
 struct MigrationIncomingState {
     QEMUFile *file;
 
+    /*
+     * Free at the start of the main state load, set as the main thread finishes
+     * loading state.
+     */
+    QemuEvent      main_thread_load_event;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
 
diff --git a/migration/migration.c b/migration/migration.c
index fb2f491..a743018 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -86,12 +86,14 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
     mis_current->file = f;
     QLIST_INIT(&mis_current->loadvm_handlers);
     qemu_mutex_init(&mis_current->rp_mutex);
+    qemu_event_init(&mis_current->main_thread_load_event, false);
 
     return mis_current;
 }
 
 void migration_incoming_state_destroy(void)
 {
+    qemu_event_destroy(&mis_current->main_thread_load_event);
     loadvm_free_handlers(mis_current);
     g_free(mis_current);
     mis_current = NULL;
diff --git a/migration/savevm.c b/migration/savevm.c
index d424c2a..7052a6f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1013,6 +1013,13 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
     return NULL;
 }
 
+enum LoadVMExitCodes {
+    /* Allow a command to quit all layers of nested loadvm loops */
+    LOADVM_QUIT     =  1,
+};
+
+static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -1028,7 +1035,9 @@ static int loadvm_process_command_simple_lencheck(const char *name,
 
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
- * negative return on error (will issue error message)
+ * 0           just a normal return
+ * LOADVM_QUIT All good, but exit the loop
+ * <0          Error
  */
 static int loadvm_process_command(QEMUFile *f)
 {
@@ -1099,36 +1108,12 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
     }
 }
 
-int qemu_loadvm_state(QEMUFile *f)
+static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
-    MigrationIncomingState *mis = migration_incoming_get_current();
-    Error *local_err = NULL;
     uint8_t section_type;
-    unsigned int v;
     int ret;
-    int file_error_after_eof = -1;
-
-    if (qemu_savevm_state_blocked(&local_err)) {
-        error_report_err(local_err);
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v != QEMU_VM_FILE_MAGIC) {
-        error_report("Not a migration stream");
-        return -EINVAL;
-    }
-
-    v = qemu_get_be32(f);
-    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-        error_report("SaveVM v2 format is obsolete and don't work anymore");
-        return -ENOTSUP;
-    }
-    if (v != QEMU_VM_FILE_VERSION) {
-        error_report("Unsupported migration stream version");
-        return -ENOTSUP;
-    }
 
+    trace_qemu_loadvm_state_main();
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
         SaveStateEntry *se;
@@ -1156,16 +1141,14 @@ int qemu_loadvm_state(QEMUFile *f)
             if (se == NULL) {
                 error_report("Unknown savevm section or instance '%s' %d",
                              idstr, instance_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Validate version */
             if (version_id > se->version_id) {
                 error_report("savevm: unsupported version %d for '%s' v%d",
                              version_id, idstr, se->version_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             /* Add entry */
@@ -1180,11 +1163,10 @@ int qemu_loadvm_state(QEMUFile *f)
             if (ret < 0) {
                 error_report("error while loading state for instance 0x%x of"
                              " device '%s'", instance_id, idstr);
-                goto out;
+                return ret;
             }
             if (!check_section_footer(f, le->se)) {
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
             break;
         case QEMU_VM_SECTION_PART:
@@ -1199,58 +1181,87 @@ int qemu_loadvm_state(QEMUFile *f)
             }
             if (le == NULL) {
                 error_report("Unknown savevm section %d", section_id);
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
 
             ret = vmstate_load(f, le->se, le->version_id);
             if (ret < 0) {
                 error_report("error while loading state section id %d(%s)",
                              section_id, le->se->idstr);
-                goto out;
+                return ret;
             }
             if (!check_section_footer(f, le->se)) {
-                ret = -EINVAL;
-                goto out;
+                return -EINVAL;
             }
             break;
         case QEMU_VM_COMMAND:
             ret = loadvm_process_command(f);
-            if (ret < 0) {
-                goto out;
+            trace_qemu_loadvm_state_section_command(ret);
+            if ((ret < 0) || (ret & LOADVM_QUIT)) {
+                return ret;
             }
             break;
         default:
             error_report("Unknown savevm section type %d", section_type);
-            ret = -EINVAL;
-            goto out;
+            return -EINVAL;
         }
     }
 
-    file_error_after_eof = qemu_file_get_error(f);
+    return 0;
+}
 
-    /*
-     * Try to read in the VMDESC section as well, so that dumping tools that
-     * intercept our migration stream have the chance to see it.
-     */
-    if (qemu_get_byte(f) == QEMU_VM_VMDESCRIPTION) {
-        uint32_t size = qemu_get_be32(f);
-        uint8_t *buf = g_malloc(0x1000);
+int qemu_loadvm_state(QEMUFile *f)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    Error *local_err = NULL;
+    unsigned int v;
+    int ret;
 
-        while (size > 0) {
-            uint32_t read_chunk = MIN(size, 0x1000);
-            qemu_get_buffer(f, buf, read_chunk);
-            size -= read_chunk;
-        }
-        g_free(buf);
+    if (qemu_savevm_state_blocked(&local_err)) {
+        error_report_err(local_err);
+        return -EINVAL;
     }
 
-    cpu_synchronize_all_post_init();
+    v = qemu_get_be32(f);
+    if (v != QEMU_VM_FILE_MAGIC) {
+        error_report("Not a migration stream");
+        return -EINVAL;
+    }
+
+    v = qemu_get_be32(f);
+    if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+        error_report("SaveVM v2 format is obsolete and don't work anymore");
+        return -ENOTSUP;
+    }
+    if (v != QEMU_VM_FILE_VERSION) {
+        error_report("Unsupported migration stream version");
+        return -ENOTSUP;
+    }
 
-    ret = 0;
+    ret = qemu_loadvm_state_main(f, mis);
+    qemu_event_set(&mis->main_thread_load_event);
 
-out:
+    trace_qemu_loadvm_state_post_main(ret);
     if (ret == 0) {
+        int file_error_after_eof = qemu_file_get_error(f);
+
+        /*
+         * Try to read in the VMDESC section as well, so that dumping tools that
+         * intercept our migration stream have the chance to see it.
+         */
+        if (qemu_get_byte(f) == QEMU_VM_VMDESCRIPTION) {
+            uint32_t size = qemu_get_be32(f);
+            uint8_t *buf = g_malloc(0x1000);
+
+            while (size > 0) {
+                uint32_t read_chunk = MIN(size, 0x1000);
+                qemu_get_buffer(f, buf, read_chunk);
+                size -= read_chunk;
+            }
+            g_free(buf);
+        }
+
+        cpu_synchronize_all_post_init();
         /* We may not have a VMDESC section, so ignore relative errors */
         ret = file_error_after_eof;
     }
diff --git a/trace-events b/trace-events
index 282cde1..5644cc2 100644
--- a/trace-events
+++ b/trace-events
@@ -1181,7 +1181,11 @@ virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64
 
 # migration/savevm.c
 qemu_loadvm_state_section(unsigned int section_type) "%d"
+qemu_loadvm_state_section_command(int ret) "%d"
 qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
+qemu_loadvm_state_main(void) ""
+qemu_loadvm_state_main_quit_parent(void) ""
+qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram.
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (15 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 16/42] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-06-16 15:43   ` Eric Blake
                     ` (2 more replies)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
                   ` (25 subsequent siblings)
  42 siblings, 3 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The 'postcopy ram' capability allows postcopy migration of RAM;
note that the migration starts off in precopy mode until
postcopy mode is triggered (see the migrate_start_postcopy
patch later in the series).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  1 +
 migration/migration.c         | 23 +++++++++++++++++++++++
 qapi-schema.json              |  6 +++++-
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1bf78f6..da4b72f 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -183,6 +183,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+bool migrate_postcopy_ram(void);
 bool migrate_zero_blocks(void);
 
 bool migrate_auto_converge(void);
diff --git a/migration/migration.c b/migration/migration.c
index a743018..cd89a9b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -408,6 +408,20 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     for (cap = params; cap; cap = cap->next) {
         s->enabled_capabilities[cap->value->capability] = cap->value->state;
     }
+
+    if (migrate_postcopy_ram()) {
+        if (migrate_use_compression()) {
+            /* The decompression threads asynchronously write into RAM
+             * rather than use the atomic copies needed to avoid
+             * userfaulting.  It should be possible to fix the decompression
+             * threads for compatibility in future.
+             */
+            error_report("Postcopy is not currently compatible with "
+                         "compression");
+            s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM] =
+                false;
+        }
+    }
 }
 
 void qmp_migrate_set_parameters(bool has_compress_level,
@@ -770,6 +784,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
     max_downtime = (uint64_t)value;
 }
 
+bool migrate_postcopy_ram(void)
+{
+    MigrationState *s;
+
+    s = migrate_get_current();
+
+    return s->enabled_capabilities[MIGRATION_CAPABILITY_X_POSTCOPY_RAM];
+}
+
 bool migrate_auto_converge(void)
 {
     MigrationState *s;
diff --git a/qapi-schema.json b/qapi-schema.json
index 6e17a5c..0b6fe54 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -526,11 +526,15 @@
 # @auto-converge: If enabled, QEMU will automatically throttle down the guest
 #          to speed up convergence of RAM migration. (since 1.6)
 #
+# @x-postcopy-ram: Start executing on the migration target before all of RAM has
+#          been migrated, pulling the remaining pages along as needed. NOTE: If
+#          the migration fails during postcopy the VM will fail.  (since 2.4)
+#
 # Since: 1.2
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
-           'compress'] }
+           'compress', 'x-postcopy-ram'] }
 
 ##
 # @MigrationCapabilityStatus
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (16 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 11:02   ` Juan Quintela
  2015-07-20 10:06   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
                   ` (24 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The state of the postcopy process is managed via a series of messages;
   * Add wrappers and handlers for sending/receiving these messages
   * Add state variable that track the current state of postcopy

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  16 +++
 include/sysemu/sysemu.h       |  20 ++++
 migration/migration.c         |  13 +++
 migration/savevm.c            | 247 ++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  10 ++
 5 files changed, 306 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index da4b72f..a5951ac 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -50,6 +50,15 @@ enum mig_rp_message_type {
 };
 
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
+
+typedef enum {
+    POSTCOPY_INCOMING_NONE = 0,  /* Initial state - no postcopy */
+    POSTCOPY_INCOMING_ADVISE,
+    POSTCOPY_INCOMING_LISTENING,
+    POSTCOPY_INCOMING_RUNNING,
+    POSTCOPY_INCOMING_END
+} PostcopyState;
+
 /* State for the incoming migration */
 struct MigrationIncomingState {
     QEMUFile *file;
@@ -62,6 +71,7 @@ struct MigrationIncomingState {
 
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
+    PostcopyState postcopy_state;
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
@@ -231,4 +241,10 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
 
 void ram_mig_init(void);
 void savevm_skip_section_footers(void);
+
+PostcopyState postcopy_state_get(MigrationIncomingState *mis);
+
+/* Set the state and return the old state */
+PostcopyState postcopy_state_set(MigrationIncomingState *mis,
+                                 PostcopyState new_state);
 #endif
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index d8875ca..c5738f5 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,17 @@ enum qemu_vm_cmd {
     MIG_CMD_INVALID = 0,       /* Must be 0 */
     MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
     MIG_CMD_PING,              /* Request a PONG on the RP */
+
+    MIG_CMD_POSTCOPY_ADVISE = 20,  /* Prior to any page transfers, just
+                                      warn we might want to do PC */
+    MIG_CMD_POSTCOPY_LISTEN,       /* Start listening for incoming
+                                      pages as it's running. */
+    MIG_CMD_POSTCOPY_RUN,          /* Start execution */
+
+    MIG_CMD_POSTCOPY_RAM_DISCARD,  /* A list of pages to discard that
+                                      were previously sent during
+                                      precopy but are dirty. */
+
 };
 
 bool qemu_savevm_state_blocked(Error **errp);
@@ -101,6 +112,15 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
+void qemu_savevm_send_postcopy_advise(QEMUFile *f);
+void qemu_savevm_send_postcopy_listen(QEMUFile *f);
+void qemu_savevm_send_postcopy_run(QEMUFile *f);
+
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len,
+                                           uint64_t *start_list,
+                                           uint64_t *end_list);
+
 int qemu_loadvm_state(QEMUFile *f);
 
 typedef enum DisplayType
diff --git a/migration/migration.c b/migration/migration.c
index cd89a9b..34cd9a6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1128,3 +1128,16 @@ void migrate_fd_connect(MigrationState *s)
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
 }
+
+PostcopyState  postcopy_state_get(MigrationIncomingState *mis)
+{
+    return atomic_fetch_add(&mis->postcopy_state, 0);
+}
+
+/* Set the state and return the old state */
+PostcopyState postcopy_state_set(MigrationIncomingState *mis,
+                                 PostcopyState new_state)
+{
+    return atomic_xchg(&mis->postcopy_state, new_state);
+}
+
diff --git a/migration/savevm.c b/migration/savevm.c
index 7052a6f..7b2f086 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -43,6 +43,7 @@
 #include "exec/memory.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "qemu/bitops.h"
 #include "qemu/iov.h"
 #include "block/snapshot.h"
 #include "block/qapi.h"
@@ -717,6 +718,77 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
 }
 
+/* Send prior to any postcopy transfer */
+void qemu_savevm_send_postcopy_advise(QEMUFile *f)
+{
+    uint64_t tmp[2];
+    tmp[0] = cpu_to_be64(getpagesize());
+    tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
+
+    trace_qemu_savevm_send_postcopy_advise();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_ADVISE, 16, (uint8_t *)tmp);
+}
+
+/* Sent prior to starting the destination running in postcopy, discard pages
+ * that have already been sent but redirtied on the source.
+ * CMD_POSTCOPY_RAM_DISCARD consist of:
+ *      byte   version (0)
+ *      byte   Length of name field (not including 0)
+ *  n x byte   RAM block name
+ *      byte   0 terminator (just for safety)
+ *  n x        Byte ranges within the named RAMBlock
+ *      be64   Start of the range
+ *      be64   end of the range + 1
+ *
+ *  name:  RAMBlock name that these entries are part of
+ *  len: Number of page entries
+ *  start_list: 'len' addresses
+ *  end_list: 'len' addresses
+ *
+ */
+void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
+                                           uint16_t len,
+                                           uint64_t *start_list,
+                                           uint64_t *end_list)
+{
+    uint8_t *buf;
+    uint16_t tmplen;
+    uint16_t t;
+    size_t name_len = strlen(name);
+
+    trace_qemu_savevm_send_postcopy_ram_discard(name, len);
+    buf = g_malloc0(len*16 + name_len + 3);
+    buf[0] = 0; /* Version */
+    assert(name_len < 256);
+    buf[1] = name_len;
+    memcpy(buf+2, name, name_len);
+    tmplen = 2+name_len;
+    buf[tmplen++] = '\0';
+
+    for (t = 0; t < len; t++) {
+        cpu_to_be64w((uint64_t *)(buf + tmplen), start_list[t]);
+        tmplen += 8;
+        cpu_to_be64w((uint64_t *)(buf + tmplen), end_list[t]);
+        tmplen += 8;
+    }
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RAM_DISCARD, tmplen, buf);
+    g_free(buf);
+}
+
+/* Get the destination into a state where it can receive postcopy data. */
+void qemu_savevm_send_postcopy_listen(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_listen();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_LISTEN, 0, NULL);
+}
+
+/* Kick the destination into running */
+void qemu_savevm_send_postcopy_run(QEMUFile *f)
+{
+    trace_savevm_send_postcopy_run();
+    qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_RUN, 0, NULL);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -1020,6 +1092,154 @@ enum LoadVMExitCodes {
 
 static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 
+/* ------ incoming postcopy messages ------ */
+/* 'advise' arrives before any transfers just to tell us that a postcopy
+ * *might* happen - it might be skipped if precopy transferred everything
+ * quickly.
+ */
+static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
+                                         uint64_t remote_hps,
+                                         uint64_t remote_tps)
+{
+    PostcopyState ps = postcopy_state_get(mis);
+    trace_loadvm_postcopy_handle_advise();
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        error_report("CMD_POSTCOPY_ADVISE in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    if (remote_hps != getpagesize())  {
+        /*
+         * Some combinations of mismatch are probably possible but it gets
+         * a bit more complicated.  In particular we need to place whole
+         * host pages on the dest at once, and we need to ensure that we
+         * handle dirtying to make sure we never end up sending part of
+         * a hostpage on it's own.
+         */
+        error_report("Postcopy needs matching host page sizes (s=%d d=%d)",
+                     (int)remote_hps, getpagesize());
+        return -1;
+    }
+
+    if (remote_tps != (1ul << qemu_target_page_bits())) {
+        /*
+         * Again, some differences could be dealt with, but for now keep it
+         * simple.
+         */
+        error_report("Postcopy needs matching target page sizes (s=%d d=%d)",
+                     (int)remote_tps, 1 << qemu_target_page_bits());
+        return -1;
+    }
+
+    postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE);
+
+    return 0;
+}
+
+/* After postcopy we will be told to throw some pages away since they're
+ * dirty and will have to be demand fetched.  Must happen before CPU is
+ * started.
+ * There can be 0..many of these messages, each encoding multiple pages.
+ */
+static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
+                                              uint16_t len)
+{
+    int tmp;
+    char ramid[256];
+    PostcopyState ps = postcopy_state_get(mis);
+
+    trace_loadvm_postcopy_ram_handle_discard();
+
+    if (ps != POSTCOPY_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD in wrong postcopy state (%d)",
+                     ps);
+        return -1;
+    }
+    /* We're expecting a
+     *    Version (0)
+     *    a RAM ID string (length byte, name, 0 term)
+     *    then at least 1 16 byte chunk
+    */
+    if (len < 20) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+
+    tmp = qemu_get_byte(mis->file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid version (%d)", tmp);
+        return -1;
+    }
+
+    if (!qemu_get_counted_string(mis->file, ramid)) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD Failed to read RAMBlock ID");
+        return -1;
+    }
+    tmp = qemu_get_byte(mis->file);
+    if (tmp != 0) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD missing nil (%d)", tmp);
+        return -1;
+    }
+
+    len -= 3+strlen(ramid);
+    if (len % 16) {
+        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
+        return -1;
+    }
+    trace_loadvm_postcopy_ram_handle_discard_header(ramid, len);
+    while (len) {
+        /* TODO - ram_discard_range gets added in a later patch
+        uint64_t start_addr, end_addr;
+        start_addr = qemu_get_be64(mis->file);
+        end_addr = qemu_get_be64(mis->file);
+
+        len -= 16;
+        int ret = ram_discard_range(mis, ramid, start_addr, end_addr - 1);
+        if (ret) {
+            return ret;
+        }
+        */
+    }
+    trace_loadvm_postcopy_ram_handle_discard_end();
+
+    return 0;
+}
+
+/* After this message we must be able to immediately receive postcopy data */
+static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
+{
+    PostcopyState ps = postcopy_state_set(mis, POSTCOPY_INCOMING_LISTENING);
+    trace_loadvm_postcopy_handle_listen();
+    if (ps != POSTCOPY_INCOMING_ADVISE) {
+        error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    /* TODO start up the postcopy listening thread */
+    return 0;
+}
+
+/* After all discards we can start running and asking for pages */
+static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
+{
+    PostcopyState ps = postcopy_state_set(mis, POSTCOPY_INCOMING_RUNNING);
+    trace_loadvm_postcopy_handle_run();
+    if (ps != POSTCOPY_INCOMING_LISTENING) {
+        error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
+        return -1;
+    }
+
+    if (autostart) {
+        /* Hold onto your hats, starting the CPU */
+        vm_start();
+    } else {
+        /* leave it paused and let management decide when to start the CPU */
+        runstate_set(RUN_STATE_PAUSED);
+    }
+
+    return 0;
+}
+
 static int loadvm_process_command_simple_lencheck(const char *name,
                                                   unsigned int actual,
                                                   unsigned int expected)
@@ -1045,6 +1265,7 @@ static int loadvm_process_command(QEMUFile *f)
     uint16_t cmd;
     uint16_t len;
     uint32_t tmp32;
+    uint64_t tmp64a, tmp64b;
 
     cmd = qemu_get_be16(f);
     len = qemu_get_be16(f);
@@ -1083,6 +1304,32 @@ static int loadvm_process_command(QEMUFile *f)
         migrate_send_rp_pong(mis, tmp32);
         break;
 
+    case MIG_CMD_POSTCOPY_ADVISE:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_ADVISE",
+                                                   len, 16)) {
+            return -1;
+        }
+        tmp64a = qemu_get_be64(f); /* hps */
+        tmp64b = qemu_get_be64(f); /* tps */
+        return loadvm_postcopy_handle_advise(mis, tmp64a, tmp64b);
+
+    case MIG_CMD_POSTCOPY_LISTEN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_LISTEN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_handle_listen(mis);
+
+    case MIG_CMD_POSTCOPY_RUN:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_RUN",
+                                                   len, 0)) {
+            return -1;
+        }
+        return loadvm_postcopy_handle_run(mis);
+
+    case MIG_CMD_POSTCOPY_RAM_DISCARD:
+        return loadvm_postcopy_ram_handle_discard(mis, len);
+
     default:
         error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
         return -1;
diff --git a/trace-events b/trace-events
index 5644cc2..44ac831 100644
--- a/trace-events
+++ b/trace-events
@@ -1187,11 +1187,21 @@ qemu_loadvm_state_main(void) ""
 qemu_loadvm_state_main_quit_parent(void) ""
 qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+loadvm_postcopy_handle_advise(void) ""
+loadvm_postcopy_handle_listen(void) ""
+loadvm_postcopy_handle_run(void) ""
+loadvm_postcopy_ram_handle_discard(void) ""
+loadvm_postcopy_ram_handle_discard_end(void) ""
+loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
+qemu_savevm_send_postcopy_advise(void) ""
+qemu_savevm_send_postcopy_ram_discard(const char *id, uint16_t len) "%s: %ud"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
 savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
 savevm_send_ping(uint32_t val) "%x"
+savevm_send_postcopy_listen(void) ""
+savevm_send_postcopy_run(void) ""
 savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (17 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 11:07   ` Juan Quintela
                     ` (2 more replies)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
                   ` (23 subsequent siblings)
  42 siblings, 3 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
stream inside a package whose length can be determined purely by reading
its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
is read off the stream prior to parsing the contents.

This is used by postcopy to load device state (from the package)
while leaving the main stream free to receive memory pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/sysemu/sysemu.h |  4 +++
 migration/savevm.c      | 94 +++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events            |  4 +++
 3 files changed, 102 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index c5738f5..5bf8f80 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -87,6 +87,7 @@ enum qemu_vm_cmd {
     MIG_CMD_INVALID = 0,       /* Must be 0 */
     MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
     MIG_CMD_PING,              /* Request a PONG on the RP */
+    MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
 
     MIG_CMD_POSTCOPY_ADVISE = 20,  /* Prior to any page transfers, just
                                       warn we might want to do PC */
@@ -100,6 +101,8 @@ enum qemu_vm_cmd {
 
 };
 
+#define MAX_VM_CMD_PACKAGED_SIZE (1ul << 24)
+
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
@@ -112,6 +115,7 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
+int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
diff --git a/migration/savevm.c b/migration/savevm.c
index 7b2f086..2c4cbe1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -718,6 +718,50 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
     qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
 }
 
+/* We have a buffer of data to send; we don't want that all to be loaded
+ * by the command itself, so the command contains just the length of the
+ * extra buffer that we then send straight after it.
+ * TODO: Must be a better way to organise that
+ *
+ * Returns:
+ *    0 on success
+ *    -ve on error
+ */
+int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
+{
+    size_t cur_iov;
+    size_t len = qsb_get_length(qsb);
+    uint32_t tmp;
+
+    if (len > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("%s: Unreasonably large packaged state: %zu",
+                     __func__, len);
+        return -1;
+    }
+
+    tmp = cpu_to_be32(len);
+
+    trace_qemu_savevm_send_packaged();
+    qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp);
+
+    /* all the data follows (concatinating the iov's) */
+    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
+        /* The iov entries are partially filled */
+        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
+                              len :
+                              qsb->iov[cur_iov].iov_len;
+        len -= towrite;
+
+        if (!towrite) {
+            break;
+        }
+
+        qemu_put_buffer(f, qsb->iov[cur_iov].iov_base, towrite);
+    }
+
+    return 0;
+}
+
 /* Send prior to any postcopy transfer */
 void qemu_savevm_send_postcopy_advise(QEMUFile *f)
 {
@@ -1253,6 +1297,48 @@ static int loadvm_process_command_simple_lencheck(const char *name,
     return 0;
 }
 
+/* Immediately following this command is a blob of data containing an embedded
+ * chunk of migration stream; read it and load it.
+ */
+static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis,
+                                      uint32_t length)
+{
+    int ret;
+    uint8_t *buffer;
+    QEMUSizedBuffer *qsb;
+
+    trace_loadvm_handle_cmd_packaged(length);
+
+    if (length > MAX_VM_CMD_PACKAGED_SIZE) {
+        error_report("Unreasonably large packaged state: %u", length);
+        return -1;
+    }
+    buffer = g_malloc0(length);
+    ret = qemu_get_buffer(mis->file, buffer, (int)length);
+    if (ret != length) {
+        g_free(buffer);
+        error_report("CMD_PACKAGED: Buffer receive fail ret=%d length=%d\n",
+                ret, length);
+        return (ret < 0) ? ret : -EAGAIN;
+    }
+    trace_loadvm_handle_cmd_packaged_received(ret);
+
+    /* Setup a dummy QEMUFile that actually reads from the buffer */
+    qsb = qsb_create(buffer, length);
+    g_free(buffer); /* Because qsb_create copies */
+    if (!qsb) {
+        error_report("Unable to create qsb");
+    }
+    QEMUFile *packf = qemu_bufopen("r", qsb);
+
+    ret = qemu_loadvm_state_main(packf, mis);
+    trace_loadvm_handle_cmd_packaged_main(ret);
+    qemu_fclose(packf);
+    qsb_free(qsb);
+
+    return ret;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * 0           just a normal return
@@ -1304,6 +1390,14 @@ static int loadvm_process_command(QEMUFile *f)
         migrate_send_rp_pong(mis, tmp32);
         break;
 
+    case MIG_CMD_PACKAGED:
+        if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_PACKAGED",
+            len, 4)) {
+            return -1;
+         }
+        tmp32 = qemu_get_be32(f);
+        return loadvm_handle_cmd_packaged(mis, tmp32);
+
     case MIG_CMD_POSTCOPY_ADVISE:
         if (loadvm_process_command_simple_lencheck("CMD_POSTCOPY_ADVISE",
                                                    len, 16)) {
diff --git a/trace-events b/trace-events
index 44ac831..299805b 100644
--- a/trace-events
+++ b/trace-events
@@ -1187,6 +1187,10 @@ qemu_loadvm_state_main(void) ""
 qemu_loadvm_state_main_quit_parent(void) ""
 qemu_loadvm_state_post_main(int ret) "%d"
 qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
+qemu_savevm_send_packaged(void) ""
+loadvm_handle_cmd_packaged(unsigned int length) "%u"
+loadvm_handle_cmd_packaged_main(int ret) "%d"
+loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (18 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 11:12   ` Juan Quintela
  2015-07-21  6:17   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test Dr. David Alan Gilbert (git)
                   ` (22 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Modify save_live_pending to return separate postcopiable and
non-postcopiable counts.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/vmstate.h |  5 +++--
 include/sysemu/sysemu.h     |  4 +++-
 migration/block.c           |  7 +++++--
 migration/migration.c       |  9 +++++++--
 migration/ram.c             |  8 ++++++--
 migration/savevm.c          | 21 +++++++++++++++++----
 trace-events                |  2 +-
 7 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 074747c..7257196 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -54,8 +54,9 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the iothread lock!  */
     int (*save_live_setup)(QEMUFile *f, void *opaque);
-    uint64_t (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size);
-
+    void (*save_live_pending)(QEMUFile *f, void *opaque, uint64_t max_size,
+                              uint64_t *non_postcopiable_pending,
+                              uint64_t *postcopiable_pending);
     LoadStateHandler *load_state;
 } SaveVMHandlers;
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 5bf8f80..ff6bb2c 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -110,7 +110,9 @@ void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable);
 void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
                               uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
diff --git a/migration/block.c b/migration/block.c
index 3005668..4483ce3 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -754,7 +754,9 @@ static int block_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
+static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+                               uint64_t *non_postcopiable_pending,
+                               uint64_t *postcopiable_pending)
 {
     /* Estimate pending number of bytes to send */
     uint64_t pending;
@@ -773,7 +775,8 @@ static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
     qemu_mutex_unlock_iothread();
 
     DPRINTF("Enter save live pending  %" PRIu64 "\n", pending);
-    return pending;
+    *non_postcopiable_pending = pending;
+    *postcopiable_pending = 0;
 }
 
 static int block_load(QEMUFile *f, void *opaque, int version_id)
diff --git a/migration/migration.c b/migration/migration.c
index 34cd9a6..e77b8b4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1024,8 +1024,13 @@ static void *migration_thread(void *opaque)
         uint64_t pending_size;
 
         if (!qemu_file_rate_limit(s->file)) {
-            pending_size = qemu_savevm_state_pending(s->file, max_size);
-            trace_migrate_pending(pending_size, max_size);
+            uint64_t pend_post, pend_nonpost;
+
+            qemu_savevm_state_pending(s->file, max_size, &pend_nonpost,
+                                      &pend_post);
+            pending_size = pend_nonpost + pend_post;
+            trace_migrate_pending(pending_size, max_size,
+                                  pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
                 qemu_savevm_state_iterate(s->file);
             } else {
diff --git a/migration/ram.c b/migration/ram.c
index 492ed8a..fb24954 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1273,7 +1273,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
     return 0;
 }
 
-static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
+static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+                             uint64_t *non_postcopiable_pending,
+                             uint64_t *postcopiable_pending)
 {
     uint64_t remaining_size;
 
@@ -1287,7 +1289,9 @@ static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
         qemu_mutex_unlock_iothread();
         remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
     }
-    return remaining_size;
+
+    *non_postcopiable_pending = 0;
+    *postcopiable_pending = remaining_size;
 }
 
 static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
diff --git a/migration/savevm.c b/migration/savevm.c
index 2c4cbe1..ebd3d31 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1012,10 +1012,20 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
     qemu_fflush(f);
 }
 
-uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
+/* Give an estimate of the amount left to be transferred,
+ * the result is split into the amount for units that can and
+ * for units that can't do postcopy.
+ */
+void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
+                               uint64_t *res_non_postcopiable,
+                               uint64_t *res_postcopiable)
 {
     SaveStateEntry *se;
-    uint64_t ret = 0;
+    uint64_t tmp_non_postcopiable, tmp_postcopiable;
+
+    *res_non_postcopiable = 0;
+    *res_postcopiable = 0;
+
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
         if (!se->ops || !se->ops->save_live_pending) {
@@ -1026,9 +1036,12 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
                 continue;
             }
         }
-        ret += se->ops->save_live_pending(f, se->opaque, max_size);
+        se->ops->save_live_pending(f, se->opaque, max_size,
+                                   &tmp_non_postcopiable, &tmp_postcopiable);
+
+        *res_postcopiable += tmp_postcopiable;
+        *res_non_postcopiable += tmp_non_postcopiable;
     }
-    return ret;
 }
 
 void qemu_savevm_state_cancel(void)
diff --git a/trace-events b/trace-events
index 299805b..339eb71 100644
--- a/trace-events
+++ b/trace-events
@@ -1419,7 +1419,7 @@ migrate_fd_cleanup(void) ""
 migrate_fd_cleanup_src_rp(void) ""
 migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
-migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
+migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (19 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 11:20   ` Juan Quintela
                     ` (2 more replies)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
                   ` (21 subsequent siblings)
  42 siblings, 3 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a check to see if the OS we're running on has all the bits
needed for postcopy.

Creates postcopy-ram.c which will get most of the other helpers we need.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/postcopy-ram.h |  19 +++++
 migration/Makefile.objs          |   2 +-
 migration/postcopy-ram.c         | 158 +++++++++++++++++++++++++++++++++++++++
 migration/savevm.c               |   5 ++
 4 files changed, 183 insertions(+), 1 deletion(-)
 create mode 100644 include/migration/postcopy-ram.h
 create mode 100644 migration/postcopy-ram.c

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
new file mode 100644
index 0000000..d81934f
--- /dev/null
+++ b/include/migration/postcopy-ram.h
@@ -0,0 +1,19 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef QEMU_POSTCOPY_RAM_H
+#define QEMU_POSTCOPY_RAM_H
+
+/* Return true if the host supports everything we need to do postcopy-ram */
+bool postcopy_ram_supported_by_host(void);
+
+#endif
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..0cac6d7 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,7 +1,7 @@
 common-obj-y += migration.o tcp.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
-common-obj-y += xbzrle.o
+common-obj-y += xbzrle.o postcopy-ram.o
 
 common-obj-$(CONFIG_RDMA) += rdma.o
 common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
new file mode 100644
index 0000000..baf83f2
--- /dev/null
+++ b/migration/postcopy-ram.c
@@ -0,0 +1,158 @@
+/*
+ * Postcopy migration for RAM
+ *
+ * Copyright 2013-2015 Red Hat, Inc. and/or its affiliates
+ *
+ * Authors:
+ *  Dave Gilbert  <dgilbert@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/*
+ * Postcopy is a migration technique where the execution flips from the
+ * source to the destination before all the data has been copied.
+ */
+
+#include <glib.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include "qemu-common.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
+#include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+
+/* Postcopy needs to detect accesses to pages that haven't yet been copied
+ * across, and efficiently map new pages in, the techniques for doing this
+ * are target OS specific.
+ */
+#if defined(__linux__)
+
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <asm/types.h> /* for __u64 */
+#endif
+
+#if defined(__linux__) && defined(__NR_userfaultfd)
+#include <linux/userfaultfd.h>
+
+static bool ufd_version_check(int ufd)
+{
+    struct uffdio_api api_struct;
+    uint64_t ioctl_mask;
+
+    api_struct.api = UFFD_API;
+    api_struct.features = 0;
+    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+        error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
+                     strerror(errno));
+        return false;
+    }
+
+    ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
+                 (__u64)1 << _UFFDIO_UNREGISTER;
+    if ((api_struct.ioctls & ioctl_mask) != ioctl_mask) {
+        error_report("Missing userfault features: %" PRIx64,
+                     (uint64_t)(~api_struct.ioctls & ioctl_mask));
+        return false;
+    }
+
+    return true;
+}
+
+bool postcopy_ram_supported_by_host(void)
+{
+    long pagesize = getpagesize();
+    int ufd = -1;
+    bool ret = false; /* Error unless we change it */
+    void *testarea = NULL;
+    struct uffdio_register reg_struct;
+    struct uffdio_range range_struct;
+    uint64_t feature_mask;
+
+    if ((1ul << qemu_target_page_bits()) > pagesize) {
+        error_report("Target page size bigger than host page size");
+        goto out;
+    }
+
+    ufd = syscall(__NR_userfaultfd, O_CLOEXEC);
+    if (ufd == -1) {
+        error_report("%s: userfaultfd not available: %s", __func__,
+                     strerror(errno));
+        goto out;
+    }
+
+    /* Version and features check */
+    if (!ufd_version_check(ufd)) {
+        goto out;
+    }
+
+    /*
+     *  We need to check that the ops we need are supported on anon memory
+     *  To do that we need to register a chunk and see the flags that
+     *  are returned.
+     */
+    testarea = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                                    MAP_ANONYMOUS, -1, 0);
+    if (testarea == MAP_FAILED) {
+        error_report("%s: Failed to map test area: %s", __func__,
+                     strerror(errno));
+        goto out;
+    }
+    g_assert(((size_t)testarea & (pagesize-1)) == 0);
+
+    reg_struct.range.start = (uintptr_t)testarea;
+    reg_struct.range.len = pagesize;
+    reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+    if (ioctl(ufd, UFFDIO_REGISTER, &reg_struct)) {
+        error_report("%s userfault register: %s", __func__, strerror(errno));
+        goto out;
+    }
+
+    range_struct.start = (uintptr_t)testarea;
+    range_struct.len = pagesize;
+    if (ioctl(ufd, UFFDIO_UNREGISTER, &range_struct)) {
+        error_report("%s userfault unregister: %s", __func__, strerror(errno));
+        goto out;
+    }
+
+    feature_mask = (__u64)1 << _UFFDIO_WAKE |
+                   (__u64)1 << _UFFDIO_COPY |
+                   (__u64)1 << _UFFDIO_ZEROPAGE;
+    if ((reg_struct.ioctls & feature_mask) != feature_mask) {
+        error_report("Missing userfault map features: %" PRIx64,
+                     (uint64_t)(~reg_struct.ioctls & feature_mask));
+        goto out;
+    }
+
+    /* Success! */
+    ret = true;
+out:
+    if (testarea) {
+        munmap(testarea, pagesize);
+    }
+    if (ufd != -1) {
+        close(ufd);
+    }
+    return ret;
+}
+
+#else
+/* No target OS support, stubs just fail */
+
+bool postcopy_ram_supported_by_host(void)
+{
+    error_report("%s: No OS support", __func__);
+    return false;
+}
+
+#endif
+
diff --git a/migration/savevm.c b/migration/savevm.c
index ebd3d31..f324c6e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -37,6 +37,7 @@
 #include "qemu/timer.h"
 #include "audio/audio.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/sockets.h"
 #include "qemu/queue.h"
 #include "sysemu/cpus.h"
@@ -1165,6 +1166,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    if (!postcopy_ram_supported_by_host()) {
+        return -1;
+    }
+
     if (remote_hps != getpagesize())  {
         /*
          * Some combinations of mismatch are probably possible but it gets
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (20 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 11:23   ` Juan Quintela
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
                   ` (20 subsequent siblings)
  42 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once postcopy is enabled (with migrate_set_capability), the migration
will still start on precopy mode.  To cause a transition into postcopy
the:

  migrate_start_postcopy

command must be issued.  Postcopy will start sometime after this
(when it's next checked in the migration loop).

Issuing the command before migration has started will error,
and issuing after it has finished is ignored.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 hmp-commands.hx               | 15 +++++++++++++++
 hmp.c                         |  7 +++++++
 hmp.h                         |  1 +
 include/migration/migration.h |  3 +++
 migration/migration.c         | 22 ++++++++++++++++++++++
 qapi-schema.json              |  8 ++++++++
 qmp-commands.hx               | 19 +++++++++++++++++++
 7 files changed, 75 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3d7dfcc..5124698 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1006,6 +1006,21 @@ Set the parameter @var{parameter} for migration.
 ETEXI
 
     {
+        .name       = "migrate_start_postcopy",
+        .args_type  = "",
+        .params     = "",
+        .help       = "Switch migration to postcopy mode",
+        .mhandler.cmd = hmp_migrate_start_postcopy,
+    },
+
+STEXI
+@item migrate_start_postcopy
+@findex migrate_start_postcopy
+Switch in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+ETEXI
+
+    {
         .name       = "client_migrate_info",
         .args_type  = "protocol:s,hostname:s,port:i?,tls-port:i?,cert-subject:s?",
         .params     = "protocol hostname port tls-port cert-subject",
diff --git a/hmp.c b/hmp.c
index 514f22f..bf1ad53 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1268,6 +1268,13 @@ void hmp_client_migrate_info(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
+{
+    Error *err = NULL;
+    qmp_migrate_start_postcopy(&err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
     const char *protocol  = qdict_get_str(qdict, "protocol");
diff --git a/hmp.h b/hmp.h
index a70ac4f..97901c2 100644
--- a/hmp.h
+++ b/hmp.h
@@ -68,6 +68,7 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
 void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
 void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
+void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
 void hmp_set_password(Monitor *mon, const QDict *qdict);
 void hmp_expire_password(Monitor *mon, const QDict *qdict);
 void hmp_eject(Monitor *mon, const QDict *qdict);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index a5951ac..e973490 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -111,6 +111,9 @@ struct MigrationState
     int64_t xbzrle_cache_size;
     int64_t setup_time;
     int64_t dirty_sync_count;
+
+    /* Flag set once the migration has been asked to enter postcopy */
+    bool start_postcopy;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index e77b8b4..6fc47f9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -465,6 +465,28 @@ void qmp_migrate_set_parameters(bool has_compress_level,
     }
 }
 
+void qmp_migrate_start_postcopy(Error **errp)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!migrate_postcopy_ram()) {
+        error_setg(errp, "Enable postcopy with migration_set_capability before"
+                         " the start of migration");
+        return;
+    }
+
+    if (s->state == MIGRATION_STATUS_NONE) {
+        error_setg(errp, "Postcopy must be started after migration has been"
+                         " started");
+        return;
+    }
+    /*
+     * we don't error if migration has finished since that would be racy
+     * with issuing this command.
+     */
+    atomic_set(&s->start_postcopy, true);
+}
+
 /* shared migration helpers */
 
 static void migrate_set_state(MigrationState *s, int old_state, int new_state)
diff --git a/qapi-schema.json b/qapi-schema.json
index 0b6fe54..b0177bb 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -661,6 +661,14 @@
             '*tls-port': 'int', '*cert-subject': 'str' } }
 
 ##
+# @migrate-start-postcopy
+#
+# Switch migration to postcopy mode
+#
+# Since: 2.4
+{ 'command': 'migrate-start-postcopy' }
+
+##
 # @MouseInfo:
 #
 # Information about a mouse device.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 867a21f..dc63ff3 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -713,6 +713,25 @@ Example:
 
 EQMP
     {
+        .name       = "migrate-start-postcopy",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_migrate_start_postcopy,
+    },
+
+SQMP
+migrate-start-postcopy
+----------------------
+
+Switch an in-progress migration to postcopy mode. Ignored after the end of
+migration (or once already in postcopy).
+
+Example:
+-> { "execute": "migrate-start-postcopy" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "query-migrate-cache-size",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_input_query_migrate_cache_size,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (21 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 11:27   ` Juan Quintela
  2015-07-21 10:33   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
                   ` (19 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy

'migration_postcopy_phase' is provided for other sections to know if
they're in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/migration/migration.h |  2 ++
 migration/migration.c         | 56 ++++++++++++++++++++++++++++++++++++-------
 qapi-schema.json              |  4 +++-
 trace-events                  |  1 +
 4 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e973490..2a22381 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -154,6 +154,8 @@ MigrationState *migrate_init(const MigrationParams *params);
 bool migration_in_setup(MigrationState *);
 bool migration_has_finished(MigrationState *);
 bool migration_has_failed(MigrationState *);
+/* True if outgoing migration has entered postcopy phase */
+bool migration_postcopy_phase(MigrationState *);
 MigrationState *migrate_get_current(void);
 
 void migrate_compress_threads_create(void);
diff --git a/migration/migration.c b/migration/migration.c
index 6fc47f9..22be23e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -288,6 +288,7 @@ static bool migration_already_active(MigrationState *ms)
 {
     switch (ms->state) {
     case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_SETUP:
         return true;
 
@@ -358,6 +359,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
         get_xbzrle_cache_stats(info);
         break;
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+        /* Mostly the same as active; TODO add some postcopy stats */
+        info->has_status = true;
+        info->has_total_time = true;
+        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+            - s->total_time;
+        info->has_expected_downtime = true;
+        info->expected_downtime = s->expected_downtime;
+        info->has_setup_time = true;
+        info->setup_time = s->setup_time;
+
+        info->has_ram = true;
+        info->ram = g_malloc0(sizeof(*info->ram));
+        info->ram->transferred = ram_bytes_transferred();
+        info->ram->remaining = ram_bytes_remaining();
+        info->ram->total = ram_bytes_total();
+        info->ram->duplicate = dup_mig_pages_transferred();
+        info->ram->skipped = skipped_mig_pages_transferred();
+        info->ram->normal = norm_mig_pages_transferred();
+        info->ram->normal_bytes = norm_mig_bytes_transferred();
+        info->ram->dirty_pages_rate = s->dirty_pages_rate;
+        info->ram->mbps = s->mbps;
+
+        if (blk_mig_active()) {
+            info->has_disk = true;
+            info->disk = g_malloc0(sizeof(*info->disk));
+            info->disk->transferred = blk_mig_bytes_transferred();
+            info->disk->remaining = blk_mig_bytes_remaining();
+            info->disk->total = blk_mig_bytes_total();
+        }
+
+        get_xbzrle_cache_stats(info);
+        break;
     case MIGRATION_STATUS_COMPLETED:
         get_xbzrle_cache_stats(info);
 
@@ -399,8 +433,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
     MigrationState *s = migrate_get_current();
     MigrationCapabilityStatusList *cap;
 
-    if (s->state == MIGRATION_STATUS_ACTIVE ||
-        s->state == MIGRATION_STATUS_SETUP) {
+    if (migration_already_active(s)) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
     }
@@ -531,7 +564,8 @@ static void migrate_fd_cleanup(void *opaque)
         s->file = NULL;
     }
 
-    assert(s->state != MIGRATION_STATUS_ACTIVE);
+    assert((s->state != MIGRATION_STATUS_ACTIVE) &&
+           (s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE));
 
     if (s->state != MIGRATION_STATUS_COMPLETED) {
         qemu_savevm_state_cancel();
@@ -566,8 +600,7 @@ static void migrate_fd_cancel(MigrationState *s)
 
     do {
         old_state = s->state;
-        if (old_state != MIGRATION_STATUS_SETUP &&
-            old_state != MIGRATION_STATUS_ACTIVE) {
+        if (!migration_already_active(s)) {
             break;
         }
         migrate_set_state(s, old_state, MIGRATION_STATUS_CANCELLING);
@@ -611,6 +644,11 @@ bool migration_has_failed(MigrationState *s)
             s->state == MIGRATION_STATUS_FAILED);
 }
 
+bool migration_postcopy_phase(MigrationState *s)
+{
+    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+}
+
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
@@ -692,8 +730,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
     params.blk = has_blk && blk;
     params.shared = has_inc && inc;
 
-    if (s->state == MIGRATION_STATUS_ACTIVE ||
-        s->state == MIGRATION_STATUS_SETUP ||
+    if (migration_already_active(s) ||
         s->state == MIGRATION_STATUS_CANCELLING) {
         error_set(errp, QERR_MIGRATION_ACTIVE);
         return;
@@ -1041,7 +1078,10 @@ static void *migration_thread(void *opaque)
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
 
-    while (s->state == MIGRATION_STATUS_ACTIVE) {
+    trace_migration_thread_setup_complete();
+
+    while (s->state == MIGRATION_STATUS_ACTIVE ||
+           s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
         int64_t current_time;
         uint64_t pending_size;
 
diff --git a/qapi-schema.json b/qapi-schema.json
index b0177bb..5e2f487 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -424,6 +424,8 @@
 #
 # @active: in the process of doing migration.
 #
+# @postcopy-active: like active, but now in postcopy mode. (since 2.4)
+#
 # @completed: migration is finished.
 #
 # @failed: some error occurred during migration process.
@@ -433,7 +435,7 @@
 ##
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
-            'active', 'completed', 'failed' ] }
+            'active', 'postcopy-active', 'completed', 'failed' ] }
 
 ##
 # @MigrationInfo
diff --git a/trace-events b/trace-events
index 339eb71..72e9889 100644
--- a/trace-events
+++ b/trace-events
@@ -1421,6 +1421,7 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 source_return_path_thread_bad_end(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (22 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 11:35   ` Juan Quintela
  2015-07-21 10:42   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
                   ` (18 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add qemu_savevm_state_complete_postcopy to complement
qemu_savevm_state_complete_precopy together with a new
save_live_complete_postcopy method on devices.

The save_live_complete_precopy method is called on
all devices during a precopy migration, and all non-postcopy
devices during a postcopy migration at the transition.

The save_live_complete_postcopy method is called at
the end of postcopy for all postcopiable devices.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/vmstate.h |  1 +
 include/sysemu/sysemu.h     |  1 +
 migration/ram.c             |  1 +
 migration/savevm.c          | 51 ++++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 7257196..dddeadd 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -40,6 +40,7 @@ typedef struct SaveVMHandlers {
     SaveStateHandler *save_state;
 
     void (*cancel)(void *opaque);
+    int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
     int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
 
     /* This runs both outside and inside the iothread lock.  */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ff6bb2c..1af2ea0 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -108,6 +108,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
                              const MigrationParams *params);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
+void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 void qemu_savevm_state_complete_precopy(QEMUFile *f);
 void qemu_savevm_state_cancel(void);
 void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
diff --git a/migration/ram.c b/migration/ram.c
index fb24954..ff1a2fb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1608,6 +1608,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
 static SaveVMHandlers savevm_ram_handlers = {
     .save_live_setup = ram_save_setup,
     .save_live_iterate = ram_save_iterate,
+    .save_live_complete_postcopy = ram_save_complete,
     .save_live_complete_precopy = ram_save_complete,
     .save_live_pending = ram_save_pending,
     .load_state = ram_load,
diff --git a/migration/savevm.c b/migration/savevm.c
index f324c6e..b7f17b4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -938,7 +938,47 @@ int qemu_savevm_state_iterate(QEMUFile *f)
 static bool should_send_vmdesc(void)
 {
     MachineState *machine = MACHINE(qdev_get_machine());
-    return !machine->suppress_vmdesc;
+    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
+    return !machine->suppress_vmdesc && !in_postcopy;
+}
+
+/*
+ * Calls the save_live_complete_postcopy methods
+ * causing the last few pages to be sent immediately and doing any associated
+ * cleanup.
+ * Note postcopy also calls qemu_savevm_state_complete_precopy to complete
+ * all the other devices, but that happens at the point we switch to postcopy.
+ */
+void qemu_savevm_state_complete_postcopy(QEMUFile *f)
+{
+    SaveStateEntry *se;
+    int ret;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->save_live_complete_postcopy) {
+            continue;
+        }
+        if (se->ops && se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+        trace_savevm_section_start(se->idstr, se->section_id);
+        /* Section type */
+        qemu_put_byte(f, QEMU_VM_SECTION_END);
+        qemu_put_be32(f, se->section_id);
+
+        ret = se->ops->save_live_complete_postcopy(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id, ret);
+        save_section_footer(f, se);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return;
+        }
+    }
+
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
 }
 
 void qemu_savevm_state_complete_precopy(QEMUFile *f)
@@ -947,13 +987,15 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
     int vmdesc_len;
     SaveStateEntry *se;
     int ret;
+    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
 
     trace_savevm_state_complete_precopy();
 
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
-        if (!se->ops || !se->ops->save_live_complete_precopy) {
+        if (!se->ops || !se->ops->save_live_complete_precopy ||
+            (in_postcopy && se->ops->save_live_complete_postcopy)) {
             continue;
         }
         if (se->ops && se->ops->is_active) {
@@ -997,7 +1039,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
         save_section_footer(f, se);
     }
 
-    qemu_put_byte(f, QEMU_VM_EOF);
+    if (!in_postcopy) {
+        /* Postcopy stream will still be going */
+        qemu_put_byte(f, QEMU_VM_EOF);
+    }
 
     json_end_array(vmdesc);
     qjson_finish(vmdesc);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (23 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 11:47   ` Juan Quintela
  2015-07-21 11:36   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
                   ` (17 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Where postcopy is preceeded by a period of precopy, the destination will
have received pages that may have been dirtied on the source after the
page was sent.  The destination must throw these pages away before
starting it's CPUs.

Maintain a 'sentmap' of pages that have already been sent.
Calculate list of sent & dirty pages
Provide helpers on the destination side to discard these.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |  12 +++
 include/migration/postcopy-ram.h |  35 +++++++
 include/qemu/typedefs.h          |   1 +
 migration/migration.c            |   1 +
 migration/postcopy-ram.c         | 108 +++++++++++++++++++++
 migration/ram.c                  | 203 ++++++++++++++++++++++++++++++++++++++-
 migration/savevm.c               |   2 -
 trace-events                     |   5 +
 8 files changed, 363 insertions(+), 4 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 2a22381..4c6cf95 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -114,6 +114,13 @@ struct MigrationState
 
     /* Flag set once the migration has been asked to enter postcopy */
     bool start_postcopy;
+
+    /* bitmap of pages that have been sent at least once
+     * only maintained and used in postcopy at the moment
+     * where it's used to send the dirtymap at the start
+     * of the postcopy phase
+     */
+    unsigned long *sentmap;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -183,6 +190,11 @@ double xbzrle_mig_cache_miss_rate(void);
 
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
+/* For outgoing discard bitmap */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms);
+/* For incoming postcopy discard */
+int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
+                      uint64_t start, uint64_t end);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index d81934f..1d38f76 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -16,4 +16,39 @@
 /* Return true if the host supports everything we need to do postcopy-ram */
 bool postcopy_ram_supported_by_host(void);
 
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end);
+
+
+/*
+ * Called at the start of each RAMBlock by the bitmap code
+ * 'offset' is the bitmap offset of the named RAMBlock in the migration
+ * bitmap.
+ * Returns a new PDS
+ */
+PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
+                                                 unsigned long offset,
+                                                 const char *name);
+
+/*
+ * Called by the bitmap code for each chunk to discard
+ * May send a discard message, may just leave it queued to
+ * be sent later
+ * 'start' and 'end' describe an inclusive range of pages in the
+ * migration bitmap in the RAM block passed to postcopy_discard_send_init
+ */
+void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
+                                unsigned long start, unsigned long end);
+
+/*
+ * Called at the end of each RAMBlock by the bitmap code
+ * Sends any outstanding discard messages, frees the PDS
+ */
+void postcopy_discard_send_finish(MigrationState *ms,
+                                  PostcopyDiscardState *pds);
+
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 8403856..61b5b46 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -61,6 +61,7 @@ typedef struct PCIExpressHost PCIExpressHost;
 typedef struct PCIHostState PCIHostState;
 typedef struct PCMCIACardState PCMCIACardState;
 typedef struct PixelFormat PixelFormat;
+typedef struct PostcopyDiscardState PostcopyDiscardState;
 typedef struct PropertyInfo PropertyInfo;
 typedef struct Property Property;
 typedef struct QEMUBH QEMUBH;
diff --git a/migration/migration.c b/migration/migration.c
index 22be23e..180e8b9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -22,6 +22,7 @@
 #include "block/block.h"
 #include "qemu/sockets.h"
 #include "migration/block.h"
+#include "migration/postcopy-ram.h"
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index baf83f2..9c76472 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -27,6 +27,22 @@
 #include "qemu/error-report.h"
 #include "trace.h"
 
+#define MAX_DISCARDS_PER_COMMAND 12
+
+struct PostcopyDiscardState {
+    const char *name;
+    uint64_t offset; /* Bitmap entry for the 1st bit of this RAMBlock */
+    uint16_t cur_entry;
+    /*
+     * Start and end address of a discard range; end_list points to the byte
+     * after the end of the range.
+     */
+    uint64_t start_list[MAX_DISCARDS_PER_COMMAND];
+    uint64_t   end_list[MAX_DISCARDS_PER_COMMAND];
+    unsigned int nsentwords;
+    unsigned int nsentcmds;
+};
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -145,6 +161,22 @@ out:
     return ret;
 }
 
+/*
+ * Discard the contents of memory start..end inclusive.
+ * We can assume that if we've been called postcopy_ram_hosttest returned true
+ */
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    trace_postcopy_ram_discard_range(start, end);
+    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {
+        error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -154,5 +186,81 @@ bool postcopy_ram_supported_by_host(void)
     return false;
 }
 
+int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
+                               uint8_t *end)
+{
+    assert(0);
+}
 #endif
 
+/* ------------------------------------------------------------------------- */
+
+/*
+ * Called at the start of each RAMBlock by the bitmap code
+ * 'offset' is the bitmap offset of the named RAMBlock in the migration
+ * bitmap.
+ * Returns a new PDS
+ */
+PostcopyDiscardState *postcopy_discard_send_init(MigrationState *ms,
+                                                 unsigned long offset,
+                                                 const char *name)
+{
+    PostcopyDiscardState *res = g_try_malloc(sizeof(PostcopyDiscardState));
+
+    if (res) {
+        res->name = name;
+        res->cur_entry = 0;
+        res->nsentwords = 0;
+        res->nsentcmds = 0;
+        res->offset = offset;
+    }
+
+    return res;
+}
+
+/*
+ * Called by the bitmap code for each chunk to discard
+ * May send a discard message, may just leave it queued to
+ * be sent later
+ * 'start' and 'end' describe an inclusive range of pages in the
+ * migration bitmap in the RAM block passed to postcopy_discard_send_init
+ */
+void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
+                                unsigned long start, unsigned long end)
+{
+    size_t tp_bits = qemu_target_page_bits();
+    /* Convert to byte offsets within the RAM block */
+    pds->start_list[pds->cur_entry] = (start - pds->offset) << tp_bits;
+    pds->end_list[pds->cur_entry] = (1 + end - pds->offset) << tp_bits;
+    pds->cur_entry++;
+    pds->nsentwords++;
+
+    if (pds->cur_entry == MAX_DISCARDS_PER_COMMAND) {
+        /* Full set, ship it! */
+        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->name,
+                                              pds->cur_entry,
+                                              pds->start_list, pds->end_list);
+        pds->nsentcmds++;
+        pds->cur_entry = 0;
+    }
+}
+
+/*
+ * Called at the end of each RAMBlock by the bitmap code
+ * Sends any outstanding discard messages, frees the PDS
+ */
+void postcopy_discard_send_finish(MigrationState *ms, PostcopyDiscardState *pds)
+{
+    /* Anything unsent? */
+    if (pds->cur_entry) {
+        qemu_savevm_send_postcopy_ram_discard(ms->file, pds->name,
+                                              pds->cur_entry,
+                                              pds->start_list, pds->end_list);
+        pds->nsentcmds++;
+    }
+
+    trace_postcopy_discard_send_finish(pds->name, pds->nsentwords,
+                                       pds->nsentcmds);
+
+    g_free(pds);
+}
diff --git a/migration/ram.c b/migration/ram.c
index ff1a2fb..8e681f2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -32,6 +32,7 @@
 #include "qemu/timer.h"
 #include "qemu/main-loop.h"
 #include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 #include "exec/address-spaces.h"
 #include "migration/page_cache.h"
 #include "qemu/error-report.h"
@@ -494,9 +495,17 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
     return 1;
 }
 
+/* mr: The region to search for dirty pages in
+ * start: Start address (typically so we can continue from previous page)
+ * ram_addr_abs: Pointer into which to store the address of the dirty page
+ *               within the global ram_addr space
+ *
+ * Returns: byte offset within memory region of the start of a dirty page
+ */
 static inline
 ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
-                                                 ram_addr_t start)
+                                                 ram_addr_t start,
+                                                 ram_addr_t *ram_addr_abs)
 {
     unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -515,6 +524,7 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
         clear_bit(next, migration_bitmap);
         migration_dirty_pages--;
     }
+    *ram_addr_abs = next << TARGET_PAGE_BITS;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
@@ -642,6 +652,19 @@ static int save_zero_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset,
     return pages;
 }
 
+static RAMBlock *ram_find_block(const char *id)
+{
+    RAMBlock *block;
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (!strcmp(id, block->idstr)) {
+            return block;
+        }
+    }
+
+    return NULL;
+}
+
 /**
  * ram_save_page: Send the given page to the stream
  *
@@ -921,13 +944,16 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
     bool complete_round = false;
     int pages = 0;
     MemoryRegion *mr;
+    ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
+                                 ram_addr_t space */
 
     if (!block)
         block = QLIST_FIRST_RCU(&ram_list.blocks);
 
     while (true) {
         mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset);
+        offset = migration_bitmap_find_and_reset_dirty(mr, offset,
+                                                       &dirty_ram_abs);
         if (complete_round && block == last_seen_block &&
             offset >= last_offset) {
             break;
@@ -958,6 +984,11 @@ static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
 
             /* if page is unmodified, continue to the next */
             if (pages > 0) {
+                MigrationState *ms = migrate_get_current();
+                if (ms->sentmap) {
+                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
+                }
+
                 last_sent_block = block;
                 break;
             }
@@ -1017,12 +1048,19 @@ void free_xbzrle_decoded_buf(void)
 
 static void migration_end(void)
 {
+    MigrationState *s = migrate_get_current();
+
     if (migration_bitmap) {
         memory_global_dirty_log_stop();
         g_free(migration_bitmap);
         migration_bitmap = NULL;
     }
 
+    if (s->sentmap) {
+        g_free(s->sentmap);
+        s->sentmap = NULL;
+    }
+
     XBZRLE_cache_lock();
     if (XBZRLE.cache) {
         cache_fini(XBZRLE.cache);
@@ -1090,6 +1128,161 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
     }
 }
 
+/* **** functions for postcopy ***** */
+
+/*
+ * Callback from postcopy_each_ram_send_discard for each RAMBlock
+ * start,end: Indexes into the bitmap for the first and last bit
+ *            representing the named block
+ */
+static int postcopy_send_discard_bm_ram(MigrationState *ms,
+                                        PostcopyDiscardState *pds,
+                                        unsigned long start, unsigned long end)
+{
+    unsigned long current;
+
+    for (current = start; current <= end; ) {
+        unsigned long set = find_next_bit(ms->sentmap, end + 1, current);
+
+        if (set <= end) {
+            unsigned long zero = find_next_zero_bit(ms->sentmap,
+                                                    end + 1, set + 1);
+
+            if (zero > end) {
+                zero = end + 1;
+            }
+            postcopy_discard_send_range(ms, pds, set, zero - 1);
+            current = zero + 1;
+        } else {
+            current = set;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *   Calls postcopy_send_discard_bm_ram for each RAMBlock
+ *   passing it bitmap indexes and name.
+ * Returns: 0 on success
+ * (qemu_ram_foreach_block ends up passing unscaled lengths
+ *  which would mean postcopy code would have to deal with target page)
+ */
+static int postcopy_each_ram_send_discard(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    int ret;
+
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+        unsigned long last = (block->offset + (block->max_length-1))
+                                >> TARGET_PAGE_BITS;
+        PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                               first,
+                                                               block->idstr);
+
+        /*
+         * Postcopy sends chunks of bitmap over the wire, but it
+         * just needs indexes at this point, avoids it having
+         * target page specific code.
+         */
+        ret = postcopy_send_discard_bm_ram(ms, pds, first, last);
+        postcopy_discard_send_finish(ms, pds);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+/*
+ * Transmit the set of pages to be discarded after precopy to the target
+ * these are pages that have been sent previously but have been dirtied
+ * Hopefully this is pretty sparse
+ */
+int ram_postcopy_send_discard_bitmap(MigrationState *ms)
+{
+    int ret;
+
+    rcu_read_lock();
+    /* This should be our last sync, the src is now paused */
+    migration_bitmap_sync();
+
+    /*
+     * Update the sentmap to be  sentmap&=dirty
+     */
+    bitmap_and(ms->sentmap, ms->sentmap, migration_bitmap,
+               last_ram_offset() >> TARGET_PAGE_BITS);
+
+
+    trace_ram_postcopy_send_discard_bitmap();
+#ifdef DEBUG_POSTCOPY
+    ram_debug_dump_bitmap(ms->sentmap, false);
+#endif
+
+    ret = postcopy_each_ram_send_discard(ms);
+    rcu_read_unlock();
+
+    return ret;
+}
+
+/*
+ * At the start of the postcopy phase of migration, any now-dirty
+ * precopied pages are discarded.
+ *
+ * start..end is an inclusive byte address range within the RAMBlock
+ *
+ * Returns 0 on success.
+ */
+int ram_discard_range(MigrationIncomingState *mis,
+                      const char *block_name,
+                      uint64_t start, uint64_t end)
+{
+    int ret = -1;
+
+    assert(end >= start);
+
+    rcu_read_lock();
+    RAMBlock *rb = ram_find_block(block_name);
+
+    if (!rb) {
+        error_report("ram_discard_range: Failed to find block '%s'",
+                     block_name);
+        goto err;
+    }
+
+    uint8_t *host_startaddr = rb->host + start;
+    uint8_t *host_endaddr;
+
+    if ((uintptr_t)host_startaddr & (qemu_host_page_size - 1)) {
+        error_report("ram_discard_range: Unaligned start address: %p",
+                     host_startaddr);
+        goto err;
+    }
+
+    if (end <= rb->used_length) {
+        host_endaddr   = rb->host + end;
+        if (((uintptr_t)host_endaddr + 1) & (qemu_host_page_size - 1)) {
+            error_report("ram_discard_range: Unaligned end address: %p",
+                         host_endaddr);
+            goto err;
+        }
+        ret = postcopy_ram_discard_range(mis, host_startaddr, host_endaddr);
+    } else {
+        error_report("ram_discard_range: Overrun block '%s' (%" PRIu64
+                     "/%" PRIu64 "/%zu)",
+                     block_name, start, end, rb->used_length);
+    }
+
+err:
+    rcu_read_unlock();
+
+    return ret;
+}
+
+
 /* Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
@@ -1147,6 +1340,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     migration_bitmap = bitmap_new(ram_bitmap_pages);
     bitmap_set(migration_bitmap, 0, ram_bitmap_pages);
 
+    if (migrate_postcopy_ram()) {
+        MigrationState *s = migrate_get_current();
+        s->sentmap = bitmap_new(ram_bitmap_pages);
+        bitmap_clear(s->sentmap, 0, ram_bitmap_pages);
+    }
+
     /*
      * Count the total number of pages used by ram blocks not including any
      * gaps due to alignment or unplugs.
diff --git a/migration/savevm.c b/migration/savevm.c
index b7f17b4..e6398dd 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1295,7 +1295,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     }
     trace_loadvm_postcopy_ram_handle_discard_header(ramid, len);
     while (len) {
-        /* TODO - ram_discard_range gets added in a later patch
         uint64_t start_addr, end_addr;
         start_addr = qemu_get_be64(mis->file);
         end_addr = qemu_get_be64(mis->file);
@@ -1305,7 +1304,6 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
         if (ret) {
             return ret;
         }
-        */
     }
     trace_loadvm_postcopy_ram_handle_discard_end();
 
diff --git a/trace-events b/trace-events
index 72e9889..5e8a120 100644
--- a/trace-events
+++ b/trace-events
@@ -1231,6 +1231,7 @@ qemu_file_fclose(void) ""
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
+ram_postcopy_send_discard_bitmap(void) ""
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
@@ -1495,6 +1496,10 @@ rdma_start_incoming_migration_after_rdma_listen(void) ""
 rdma_start_outgoing_migration_after_rdma_connect(void) ""
 rdma_start_outgoing_migration_after_rdma_source_init(void) ""
 
+# migration/postcopy-ram.c
+postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s mask words sent=%d in %d commands"
+postcopy_ram_discard_range(void *start, void *end) "%p,%p"
+
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
 kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (24 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 12:04   ` Juan Quintela
  2015-07-22  6:19   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
                   ` (16 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h    |   3 +
 include/migration/postcopy-ram.h |  12 ++++
 migration/postcopy-ram.c         | 116 +++++++++++++++++++++++++++++++++++++++
 migration/ram.c                  |  11 ++++
 migration/savevm.c               |   4 ++
 trace-events                     |   2 +
 6 files changed, 148 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4c6cf95..98e2568 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -69,6 +69,8 @@ struct MigrationIncomingState {
      */
     QemuEvent      main_thread_load_event;
 
+    /* For the kernel to send us notifications */
+    int            userfault_fd;
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyState postcopy_state;
@@ -195,6 +197,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
 /* For incoming postcopy discard */
 int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
                       uint64_t start, uint64_t end);
+int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 
 /**
  * @migrate_add_blocker - prevent migration from proceeding
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 1d38f76..4192108 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -17,6 +17,18 @@
 bool postcopy_ram_supported_by_host(void);
 
 /*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from ram.c's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages);
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis);
+
+/*
  * Discard the contents of memory start..end inclusive.
  * We can assume that if we've been called postcopy_ram_hosttest returned true
  */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9c76472..35c87b4 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -177,6 +177,111 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
     return 0;
 }
 
+/*
+ * Setup an area of RAM so that it *can* be used for postcopy later; this
+ * must be done right at the start prior to pre-copy.
+ * opaque should be the MIS.
+ */
+static int init_area(const char *block_name, void *host_addr,
+                     ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+
+    trace_postcopy_init_area(block_name, host_addr, offset, length);
+
+    /*
+     * We need the whole of RAM to be truly empty for postcopy, so things
+     * like ROMs and any data tables built during init must be zero'd
+     * - we're going to get the copy from the source anyway.
+     * (Precopy will just overwrite this data, so doesn't need the discard)
+     */
+    if (postcopy_ram_discard_range(mis, host_addr, (host_addr + length - 1))) {
+        return -1;
+    }
+
+    /*
+     * We also need the area to be normal 4k pages, not huge pages
+     * (otherwise we can't be sure we can atomically place the
+     * 4k page in later).  THP might come along and map a 2MB page
+     * and when it's partially accessed in precopy it might not break
+     * it down, but leave a 2MB zero'd page.
+     */
+#ifdef MADV_NOHUGEPAGE
+    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
+        error_report("%s: NOHUGEPAGE: %s", __func__, strerror(errno));
+        return -1;
+    }
+#endif
+
+    return 0;
+}
+
+/*
+ * At the end of migration, undo the effects of init_area
+ * opaque should be the MIS.
+ */
+static int cleanup_area(const char *block_name, void *host_addr,
+                        ram_addr_t offset, ram_addr_t length, void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    struct uffdio_range range_struct;
+    trace_postcopy_cleanup_area(block_name, host_addr, offset, length);
+
+    /*
+     * We turned off hugepage for the precopy stage with postcopy enabled
+     * we can turn it back on now.
+     */
+#ifdef MADV_HUGEPAGE
+    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
+        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
+        return -1;
+    }
+#endif
+
+    /*
+     * We can also turn off userfault now since we should have all the
+     * pages.   It can be useful to leave it on to debug postcopy
+     * if you're not sure it's always getting every page.
+     */
+    range_struct.start = (uintptr_t)host_addr;
+    range_struct.len = length;
+
+    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
+        error_report("%s: userfault unregister %s", __func__, strerror(errno));
+
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Initialise postcopy-ram, setting the RAM to a state where we can go into
+ * postcopy later; must be called prior to any precopy.
+ * called from arch_init's similarly named ram_postcopy_incoming_init
+ */
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    if (qemu_ram_foreach_block(init_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * At the end of a migration where postcopy_ram_incoming_init was called.
+ */
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    /* TODO: Join the fault thread once we're sure it will exit */
+    if (qemu_ram_foreach_block(cleanup_area, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
 
@@ -186,6 +291,17 @@ bool postcopy_ram_supported_by_host(void)
     return false;
 }
 
+int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
+{
+    error_report("postcopy_ram_incoming_init: No OS support");
+    return -1;
+}
+
+int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
+{
+    assert(0);
+}
+
 int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
                                uint8_t *end)
 {
diff --git a/migration/ram.c b/migration/ram.c
index 8e681f2..f7d957e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1667,6 +1667,17 @@ static void decompress_data_with_multi_threads(uint8_t *compbuf,
     }
 }
 
+/*
+ * Allocate data structures etc needed by incoming migration with postcopy-ram
+ * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
+ */
+int ram_postcopy_incoming_init(MigrationIncomingState *mis)
+{
+    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
+
+    return postcopy_ram_incoming_init(mis, ram_pages);
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     int flags = 0, ret = 0;
diff --git a/migration/savevm.c b/migration/savevm.c
index e6398dd..f4de52d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1238,6 +1238,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    if (ram_postcopy_incoming_init(mis)) {
+        return -1;
+    }
+
     postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE);
 
     return 0;
diff --git a/trace-events b/trace-events
index 5e8a120..2ffc1c6 100644
--- a/trace-events
+++ b/trace-events
@@ -1498,7 +1498,9 @@ rdma_start_outgoing_migration_after_rdma_source_init(void) ""
 
 # migration/postcopy-ram.c
 postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s mask words sent=%d in %d commands"
+postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_ram_discard_range(void *start, void *end) "%p,%p"
+postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (25 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 12:10   ` Juan Quintela
  2015-07-23  5:22   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
                   ` (15 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Mark the area of RAM as 'userfault'
Start up a fault-thread to handle any userfaults we might receive
from it (to be filled in later)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/migration/migration.h    |  3 ++
 include/migration/postcopy-ram.h |  6 ++++
 migration/postcopy-ram.c         | 69 +++++++++++++++++++++++++++++++++++++++-
 migration/savevm.c               |  9 ++++++
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 98e2568..e6585c5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -69,6 +69,9 @@ struct MigrationIncomingState {
      */
     QemuEvent      main_thread_load_event;
 
+    QemuThread     fault_thread;
+    QemuSemaphore  fault_thread_sem;
+
     /* For the kernel to send us notifications */
     int            userfault_fd;
     QEMUFile *return_path;
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 4192108..8a8616b 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -17,6 +17,12 @@
 bool postcopy_ram_supported_by_host(void);
 
 /*
+ * Make all of RAM sensitive to accesses to areas that haven't yet been written
+ * and wire up anything necessary to deal with it.
+ */
+int postcopy_ram_enable_notify(MigrationIncomingState *mis);
+
+/*
  * Initialise postcopy-ram, setting the RAM to a state where we can go into
  * postcopy later; must be called prior to any precopy.
  * called from ram.c's similarly named ram_postcopy_incoming_init
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 35c87b4..7158d08 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -282,9 +282,71 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Mark the given area of RAM as requiring notification to unwritten areas
+ * Used as a  callback on qemu_ram_foreach_block.
+ *   host_addr: Base of area to mark
+ *   offset: Offset in the whole ram arena
+ *   length: Length of the section
+ *   opaque: MigrationIncomingState pointer
+ * Returns 0 on success
+ */
+static int ram_block_enable_notify(const char *block_name, void *host_addr,
+                                   ram_addr_t offset, ram_addr_t length,
+                                   void *opaque)
+{
+    MigrationIncomingState *mis = opaque;
+    struct uffdio_register reg_struct;
+
+    reg_struct.range.start = (uintptr_t)host_addr;
+    reg_struct.range.len = length;
+    reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+    /* Now tell our userfault_fd that it's responsible for this area */
+    if (ioctl(mis->userfault_fd, UFFDIO_REGISTER, &reg_struct)) {
+        error_report("%s userfault register: %s", __func__, strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Handle faults detected by the USERFAULT markings
+ */
+static void *postcopy_ram_fault_thread(void *opaque)
+{
+    MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+
+    fprintf(stderr, "postcopy_ram_fault_thread\n");
+    /* TODO: In later patch */
+    qemu_sem_post(&mis->fault_thread_sem);
+    while (1) {
+        /* TODO: In later patch */
+    }
+
+    return NULL;
+}
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    /* Create the fault handler thread and wait for it to be ready */
+    qemu_sem_init(&mis->fault_thread_sem, 0);
+    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
+                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->fault_thread_sem);
+    qemu_sem_destroy(&mis->fault_thread_sem);
+
+    /* Mark so that we get notified of accesses to unwritten areas */
+    if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
+        return -1;
+    }
+
+    return 0;
+}
+
 #else
 /* No target OS support, stubs just fail */
-
 bool postcopy_ram_supported_by_host(void)
 {
     error_report("%s: No OS support", __func__);
@@ -307,6 +369,11 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
 {
     assert(0);
 }
+
+int postcopy_ram_enable_notify(MigrationIncomingState *mis)
+{
+    assert(0);
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/migration/savevm.c b/migration/savevm.c
index f4de52d..b87238a 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1324,6 +1324,15 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
+    /*
+     * Sensitise RAM - can now generate requests for blocks that don't exist
+     * However, at this point the CPU shouldn't be running, and the IO
+     * shouldn't be doing anything yet so don't actually expect requests
+     */
+    if (postcopy_ram_enable_notify(mis)) {
+        return -1;
+    }
+
     /* TODO start up the postcopy listening thread */
     return 0;
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (26 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 12:56   ` Juan Quintela
  2015-07-23  5:55   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread Dr. David Alan Gilbert (git)
                   ` (14 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Rework the migration thread to setup and start postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   3 +
 migration/migration.c         | 166 ++++++++++++++++++++++++++++++++++++++++--
 trace-events                  |   4 +
 3 files changed, 167 insertions(+), 6 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index e6585c5..68a1731 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -120,6 +120,9 @@ struct MigrationState
     /* Flag set once the migration has been asked to enter postcopy */
     bool start_postcopy;
 
+    /* Flag set once the migration thread is running (and needs joining) */
+    bool started_migration_thread;
+
     /* bitmap of pages that have been sent at least once
      * only maintained and used in postcopy at the moment
      * where it's used to send the dirtymap at the start
diff --git a/migration/migration.c b/migration/migration.c
index 180e8b9..8d15f33 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -557,7 +557,10 @@ static void migrate_fd_cleanup(void *opaque)
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
-        qemu_thread_join(&s->thread);
+        if (s->started_migration_thread) {
+            qemu_thread_join(&s->thread);
+            s->started_migration_thread = false;
+        }
         qemu_mutex_lock_iothread();
 
         migrate_compress_threads_join();
@@ -1021,7 +1024,6 @@ out:
     return NULL;
 }
 
-__attribute__ (( unused )) /* Until later in patch series */
 static int open_return_path_on_source(MigrationState *ms)
 {
 
@@ -1060,23 +1062,141 @@ static int await_return_path_close_on_source(MigrationState *ms)
 }
 
 /*
+ * Switch from normal iteration to postcopy
+ * Returns non-0 on error
+ */
+static int postcopy_start(MigrationState *ms, bool *old_vm_running)
+{
+    int ret;
+    const QEMUSizedBuffer *qsb;
+    int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    migrate_set_state(ms, MIGRATION_STATUS_ACTIVE,
+                      MIGRATION_STATUS_POSTCOPY_ACTIVE);
+
+    trace_postcopy_start();
+    qemu_mutex_lock_iothread();
+    trace_postcopy_start_set_run();
+
+    qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+    *old_vm_running = runstate_is_running();
+
+    ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /*
+     * in Finish migrate and with the io-lock held everything should
+     * be quiet, but we've potentially still got dirty pages and we
+     * need to tell the destination to throw any pages it's already received
+     * that are dirty
+     */
+    if (ram_postcopy_send_discard_bitmap(ms)) {
+        error_report("postcopy send discard bitmap failed");
+        goto fail;
+    }
+
+    /*
+     * send rest of state - note things that are doing postcopy
+     * will notice we're in POSTCOPY_ACTIVE and not actually
+     * wrap their state up here
+     */
+    qemu_file_set_rate_limit(ms->file, INT64_MAX);
+    /* Ping just for debugging, helps line traces up */
+    qemu_savevm_send_ping(ms->file, 2);
+
+    /*
+     * We need to leave the fd free for page transfers during the
+     * loading of the device state, so wrap all the remaining
+     * commands and state into a package that gets sent in one go
+     */
+    QEMUFile *fb = qemu_bufopen("w", NULL);
+    if (!fb) {
+        error_report("Failed to create buffered file");
+        goto fail;
+    }
+
+    qemu_savevm_state_complete_precopy(fb);
+    qemu_savevm_send_ping(fb, 3);
+
+    qemu_savevm_send_postcopy_run(fb);
+
+    /* <><> end of stuff going into the package */
+    qsb = qemu_buf_get(fb);
+
+    /* Now send that blob */
+    if (qemu_savevm_send_packaged(ms->file, qsb)) {
+        goto fail_closefb;
+    }
+    qemu_fclose(fb);
+    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
+
+    qemu_mutex_unlock_iothread();
+
+    /*
+     * Although this ping is just for debug, it could potentially be
+     * used for getting a better measurement of downtime at the source.
+     */
+    qemu_savevm_send_ping(ms->file, 4);
+
+    ret = qemu_file_get_error(ms->file);
+    if (ret) {
+        error_report("postcopy_start: Migration stream errored");
+        migrate_set_state(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                              MIGRATION_STATUS_FAILED);
+    }
+
+    return ret;
+
+fail_closefb:
+    qemu_fclose(fb);
+fail:
+    migrate_set_state(ms, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+                          MIGRATION_STATUS_FAILED);
+    qemu_mutex_unlock_iothread();
+    return -1;
+}
+
+/*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
  */
 static void *migration_thread(void *opaque)
 {
     MigrationState *s = opaque;
+    /* Used by the bandwidth calcs, updated later */
     int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
     int64_t initial_bytes = 0;
     int64_t max_size = 0;
     int64_t start_time = initial_time;
     bool old_vm_running = false;
+    bool entered_postcopy = false;
+    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
+    enum MigrationStatus current_active_type = MIGRATION_STATUS_ACTIVE;
 
     qemu_savevm_state_header(s->file);
+
+    if (migrate_postcopy_ram()) {
+        /* Now tell the dest that it should open its end so it can reply */
+        qemu_savevm_send_open_return_path(s->file);
+
+        /* And do a ping that will make stuff easier to debug */
+        qemu_savevm_send_ping(s->file, 1);
+
+        /*
+         * Tell the destination that we *might* want to do postcopy later;
+         * if the other end can't do postcopy it should fail now, nice and
+         * early.
+         */
+        qemu_savevm_send_postcopy_advise(s->file);
+    }
+
     qemu_savevm_state_begin(s->file, &s->params);
 
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+    current_active_type = MIGRATION_STATUS_ACTIVE;
     migrate_set_state(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE);
 
     trace_migration_thread_setup_complete();
@@ -1095,6 +1215,22 @@ static void *migration_thread(void *opaque)
             trace_migrate_pending(pending_size, max_size,
                                   pend_post, pend_nonpost);
             if (pending_size && pending_size >= max_size) {
+                /* Still a significant amount to transfer */
+
+                current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+                if (migrate_postcopy_ram() &&
+                    s->state != MIGRATION_STATUS_POSTCOPY_ACTIVE &&
+                    pend_nonpost <= max_size &&
+                    atomic_read(&s->start_postcopy)) {
+
+                    if (!postcopy_start(s, &old_vm_running)) {
+                        current_active_type = MIGRATION_STATUS_POSTCOPY_ACTIVE;
+                        entered_postcopy = true;
+                    }
+
+                    continue;
+                }
+                /* Just another iteration step */
                 qemu_savevm_state_iterate(s->file);
             } else {
                 int ret;
@@ -1126,8 +1262,8 @@ static void *migration_thread(void *opaque)
         }
 
         if (qemu_file_get_error(s->file)) {
-            migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                              MIGRATION_STATUS_FAILED);
+            migrate_set_state(s, current_active_type, MIGRATION_STATUS_FAILED);
+            trace_migration_thread_file_err();
             break;
         }
         current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
@@ -1158,19 +1294,22 @@ static void *migration_thread(void *opaque)
         }
     }
 
+    trace_migration_thread_after_loop();
     qemu_mutex_lock_iothread();
     if (s->state == MIGRATION_STATUS_COMPLETED) {
         int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
         uint64_t transferred_bytes = qemu_ftell(s->file);
         s->total_time = end_time - s->total_time;
-        s->downtime = end_time - start_time;
+        if (!entered_postcopy) {
+            s->downtime = end_time - start_time;
+        }
         if (s->total_time) {
             s->mbps = (((double) transferred_bytes * 8.0) /
                        ((double) s->total_time)) / 1000;
         }
         runstate_set(RUN_STATE_POSTMIGRATE);
     } else {
-        if (old_vm_running) {
+        if (old_vm_running && !entered_postcopy) {
             vm_start();
         }
     }
@@ -1192,9 +1331,24 @@ void migrate_fd_connect(MigrationState *s)
     /* Notify before starting migration thread */
     notifier_list_notify(&migration_state_notifiers, s);
 
+    /*
+     * Open the return path; currently for postcopy but other things might
+     * also want it.
+     */
+    if (migrate_postcopy_ram()) {
+        if (open_return_path_on_source(s)) {
+            error_report("Unable to open return-path for postcopy");
+            migrate_set_state(s, MIGRATION_STATUS_SETUP,
+                              MIGRATION_STATUS_FAILED);
+            migrate_fd_cleanup(s);
+            return;
+        }
+    }
+
     migrate_compress_threads_create();
     qemu_thread_create(&s->thread, "migration", migration_thread, s,
                        QEMU_THREAD_JOINABLE);
+    s->started_migration_thread = true;
 }
 
 PostcopyState  postcopy_state_get(MigrationIncomingState *mis)
diff --git a/trace-events b/trace-events
index 2ffc1c6..f096877 100644
--- a/trace-events
+++ b/trace-events
@@ -1422,9 +1422,13 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migration_thread_after_loop(void) ""
+migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
+postcopy_start(void) ""
+postcopy_start_set_run(void) ""
 source_return_path_thread_bad_end(void) ""
 source_return_path_thread_end(void) ""
 source_return_path_thread_entry(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (27 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 13:15   ` Juan Quintela
  2015-07-23  6:41   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
                   ` (13 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The end of migration in postcopy is a bit different since some of
the things normally done at the end of migration have already been
done on the transition to postcopy.

The end of migration code is getting a bit complciated now, so
move out into its own function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 91 +++++++++++++++++++++++++++++++++++++--------------
 trace-events          |  6 ++++
 2 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 8d15f33..3e5a7c8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1041,7 +1041,6 @@ static int open_return_path_on_source(MigrationState *ms)
     return 0;
 }
 
-__attribute__ (( unused )) /* Until later in patch series */
 /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
 static int await_return_path_close_on_source(MigrationState *ms)
 {
@@ -1159,6 +1158,68 @@ fail:
 }
 
 /*
+ * Used by migration_thread when there's not much left pending.
+ * The caller 'breaks' the loop when this returns.
+ */
+static void migration_thread_end_of_iteration(MigrationState *s,
+                                              int current_active_state,
+                                              bool *old_vm_running,
+                                              int64_t *start_time)
+{
+    int ret;
+    if (s->state == MIGRATION_STATUS_ACTIVE) {
+        qemu_mutex_lock_iothread();
+        *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+        *old_vm_running = runstate_is_running();
+
+        ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+        if (ret >= 0) {
+            qemu_file_set_rate_limit(s->file, INT64_MAX);
+            qemu_savevm_state_complete_precopy(s->file);
+        }
+        qemu_mutex_unlock_iothread();
+
+        if (ret < 0) {
+            goto fail;
+        }
+    } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
+        trace_migration_thread_end_of_iteration_postcopy_end();
+
+        qemu_savevm_state_complete_postcopy(s->file);
+        trace_migration_thread_end_of_iteration_postcopy_end_after_complete();
+    }
+
+    /*
+     * If rp was opened we must clean up the thread before
+     * cleaning everything else up (since if there are no failures
+     * it will wait for the destination to send it's status in
+     * a SHUT command).
+     * Postcopy opens rp if enabled (even if it's not avtivated)
+     */
+    if (migrate_postcopy_ram()) {
+        int rp_error;
+        trace_migration_thread_end_of_iteration_postcopy_end_before_rp();
+        rp_error = await_return_path_close_on_source(s);
+        trace_migration_thread_end_of_iteration_postcopy_end_after_rp(rp_error);
+        if (rp_error) {
+            goto fail;
+        }
+    }
+
+    if (qemu_file_get_error(s->file)) {
+        trace_migration_thread_end_of_iteration_file_err();
+        goto fail;
+    }
+
+    migrate_set_state(s, current_active_state, MIGRATION_STATUS_COMPLETED);
+    return;
+
+fail:
+    migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
+}
+
+/*
  * Master migration thread on the source VM.
  * It drives the migration and pumps the data down the outgoing channel.
  */
@@ -1233,31 +1294,11 @@ static void *migration_thread(void *opaque)
                 /* Just another iteration step */
                 qemu_savevm_state_iterate(s->file);
             } else {
-                int ret;
-
-                qemu_mutex_lock_iothread();
-                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-                old_vm_running = runstate_is_running();
-
-                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-                if (ret >= 0) {
-                    qemu_file_set_rate_limit(s->file, INT64_MAX);
-                    qemu_savevm_state_complete_precopy(s->file);
-                }
-                qemu_mutex_unlock_iothread();
+                trace_migration_thread_low_pending(pending_size);
 
-                if (ret < 0) {
-                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                                      MIGRATION_STATUS_FAILED);
-                    break;
-                }
-
-                if (!qemu_file_get_error(s->file)) {
-                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
-                                      MIGRATION_STATUS_COMPLETED);
-                    break;
-                }
+                migration_thread_end_of_iteration(s, current_active_type,
+                    &old_vm_running, &start_time);
+                break;
             }
         }
 
diff --git a/trace-events b/trace-events
index f096877..528d5a3 100644
--- a/trace-events
+++ b/trace-events
@@ -1425,6 +1425,12 @@ migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
+migration_thread_low_pending(uint64_t pending) "%" PRIu64
+migration_thread_end_of_iteration_file_err(void) ""
+migration_thread_end_of_iteration_postcopy_end(void) ""
+migration_thread_end_of_iteration_postcopy_end_after_complete(void) ""
+migration_thread_end_of_iteration_postcopy_end_before_rp(void) ""
+migration_thread_end_of_iteration_postcopy_end_after_rp(int rp_error) "%d"
 open_return_path_on_source(void) ""
 open_return_path_on_source_continue(void) ""
 postcopy_start(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (28 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-13 13:24   ` Juan Quintela
  2015-07-23  6:50   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request Dr. David Alan Gilbert (git)
                   ` (12 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
destination to request a page from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration/migration.c         | 70 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  1 +
 3 files changed, 75 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 68a1731..8742d53 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -47,6 +47,8 @@ enum mig_rp_message_type {
     MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
     MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
     MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
+
+    MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be64) */
 };
 
 typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
@@ -246,6 +248,8 @@ void migrate_send_rp_shut(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
+void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
+                              ram_addr_t start, ram_addr_t len);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration/migration.c b/migration/migration.c
index 3e5a7c8..0373b77 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -113,6 +113,36 @@ static void deferred_incoming_migration(Error **errp)
     deferred_incoming = true;
 }
 
+/* Request a range of pages from the source VM at the given
+ * start address.
+ *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
+ *           as the last request (a name must have been given previously)
+ *   Start: Address offset within the RB
+ *   Len: Length in bytes required - must be a multiple of pagesize
+ */
+void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
+                               ram_addr_t start, ram_addr_t len)
+{
+    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
+    uint64_t *buf64 = (uint64_t *)bufc;
+    size_t msglen = 16; /* start + len */
+
+    assert(!(len & 1));
+    if (rbname) {
+        int rbname_len = strlen(rbname);
+        assert(rbname_len < 256);
+
+        len |= 1; /* Flag to say we've got a name */
+        bufc[msglen++] = rbname_len;
+        memcpy(bufc + msglen, rbname, rbname_len);
+        msglen += rbname_len;
+    }
+
+    buf64[0] = cpu_to_be64((uint64_t)start);
+    buf64[1] = cpu_to_be64((uint64_t)len);
+    migrate_send_rp_message(mis, MIG_RP_MSG_REQ_PAGES, msglen, bufc);
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -939,6 +969,17 @@ static void source_return_path_bad(MigrationState *s)
 }
 
 /*
+ * Process a request for pages received on the return path,
+ * We're allowed to send more than requested (e.g. to round to our page size)
+ * and we don't need to send pages that have already been sent.
+ */
+static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
+                                       ram_addr_t start, ram_addr_t len)
+{
+    trace_migrate_handle_rp_req_pages(rbname, start, len);
+}
+
+/*
  * Handles messages sent on the return path towards the source VM
  *
  */
@@ -950,6 +991,8 @@ static void *source_return_path_thread(void *opaque)
     const int max_len = 512;
     uint8_t buf[max_len];
     uint32_t tmp32;
+    ram_addr_t start, len;
+    char *tmpstr;
     int res;
 
     trace_source_return_path_thread_entry();
@@ -965,6 +1008,11 @@ static void *source_return_path_thread(void *opaque)
             expected_len = 4;
             break;
 
+        case MIG_RP_MSG_REQ_PAGES:
+            /* 16 byte start/len _possibly_ plus an id str */
+            expected_len = 16 + 256;
+            break;
+
         default:
             error_report("RP: Received invalid message 0x%04x length 0x%04x",
                     header_type, header_len);
@@ -1010,6 +1058,28 @@ static void *source_return_path_thread(void *opaque)
             trace_source_return_path_thread_pong(tmp32);
             break;
 
+        case MIG_RP_MSG_REQ_PAGES:
+            start = be64_to_cpup((uint64_t *)buf);
+            len = be64_to_cpup(((uint64_t *)buf)+1);
+            tmpstr = NULL;
+            if (len & 1) {
+                len -= 1; /* Remove the flag */
+                /* Now we expect an idstr */
+                tmp32 = buf[16]; /* Length of the following idstr */
+                tmpstr = (char *)&buf[17];
+                buf[17+tmp32] = '\0';
+                expected_len = 16+1+tmp32;
+            } else {
+                expected_len = 16;
+            }
+            if (header_len != expected_len) {
+                error_report("RP: Req_Page with length %d expecting %d",
+                        header_len, expected_len);
+                source_return_path_bad(ms);
+            }
+            migrate_handle_rp_req_pages(ms, tmpstr, start, len);
+            break;
+
         default:
             break;
         }
diff --git a/trace-events b/trace-events
index 528d5a3..328d64c 100644
--- a/trace-events
+++ b/trace-events
@@ -1422,6 +1422,7 @@ migrate_fd_error(void) ""
 migrate_fd_cancel(void) ""
 migrate_pending(uint64_t size, uint64_t max, uint64_t post, uint64_t nonpost) "pending size %" PRIu64 " max %" PRIu64 " (post=%" PRIu64 " nonpost=%" PRIu64 ")"
 migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
+migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at %zx len %zx"
 migration_thread_after_loop(void) ""
 migration_thread_file_err(void) ""
 migration_thread_setup_complete(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (29 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14  9:18   ` Juan Quintela
  2015-07-23 12:23   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
                   ` (11 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On receiving MIG_RPCOMM_REQ_PAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h | 21 +++++++++++++++
 migration/migration.c         | 36 +++++++++++++++++++++++++
 migration/ram.c               | 63 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                  |  1 +
 4 files changed, 121 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8742d53..20d1b39 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -88,6 +88,18 @@ MigrationIncomingState *migration_incoming_get_current(void);
 MigrationIncomingState *migration_incoming_state_new(QEMUFile *f);
 void migration_incoming_state_destroy(void);
 
+/*
+ * An outstanding page request, on the source, having been received
+ * and queued
+ */
+struct MigrationSrcPageRequest {
+    RAMBlock *rb;
+    hwaddr    offset;
+    hwaddr    len;
+
+    QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
+};
+
 struct MigrationState
 {
     int64_t bandwidth_limit;
@@ -131,6 +143,12 @@ struct MigrationState
      * of the postcopy phase
      */
     unsigned long *sentmap;
+
+    /* Queue of outstanding page requests from the destination */
+    QemuMutex src_page_req_mutex;
+    QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
+    /* The RAMBlock used in the last src_page_request */
+    RAMBlock *last_req_rb;
 };
 
 void process_incoming_migration(QEMUFile *f);
@@ -272,6 +290,9 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t block_offset,
 void ram_mig_init(void);
 void savevm_skip_section_footers(void);
 
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len);
+
 PostcopyState postcopy_state_get(MigrationIncomingState *mis);
 
 /* Set the state and return the old state */
diff --git a/migration/migration.c b/migration/migration.c
index 0373b77..7fa982e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -26,6 +26,8 @@
 #include "qemu/thread.h"
 #include "qmp-commands.h"
 #include "trace.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
 
@@ -57,6 +59,7 @@ static bool deferred_incoming;
 /* For outgoing */
 MigrationState *migrate_get_current(void)
 {
+    static bool once;
     static MigrationState current_migration = {
         .state = MIGRATION_STATUS_NONE,
         .bandwidth_limit = MAX_THROTTLE,
@@ -70,6 +73,10 @@ MigrationState *migrate_get_current(void)
                 DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
     };
 
+    if (!once) {
+        qemu_mutex_init(&current_migration.src_page_req_mutex);
+        once = true;
+    }
     return &current_migration;
 }
 
@@ -584,6 +591,15 @@ static void migrate_fd_cleanup(void *opaque)
 
     migrate_fd_cleanup_src_rp(s);
 
+    /* This queue generally should be empty - but in the case of a failed
+     * migration might have some droppings in.
+     */
+    struct MigrationSrcPageRequest *mspr, *next_mspr;
+    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
+        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
+        g_free(mspr);
+    }
+
     if (s->file) {
         trace_migrate_fd_cleanup();
         qemu_mutex_unlock_iothread();
@@ -713,6 +729,8 @@ MigrationState *migrate_init(const MigrationParams *params)
     s->state = MIGRATION_STATUS_SETUP;
     trace_migrate_set_state(MIGRATION_STATUS_SETUP);
 
+    QSIMPLEQ_INIT(&s->src_page_requests);
+
     s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     return s;
 }
@@ -976,7 +994,25 @@ static void source_return_path_bad(MigrationState *s)
 static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
                                        ram_addr_t start, ram_addr_t len)
 {
+    long our_host_ps = getpagesize();
+
     trace_migrate_handle_rp_req_pages(rbname, start, len);
+
+    /*
+     * Since we currently insist on matching page sizes, just sanity check
+     * we're being asked for whole host pages.
+     */
+    if (start & (our_host_ps-1) ||
+       (len & (our_host_ps-1))) {
+        error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT
+                     " len: " RAM_ADDR_FMT, __func__, start, len);
+        source_return_path_bad(ms);
+        return;
+    }
+
+    if (ram_save_queue_pages(ms, rbname, start, len)) {
+        source_return_path_bad(ms);
+    }
 }
 
 /*
diff --git a/migration/ram.c b/migration/ram.c
index f7d957e..da3e9ea 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -924,6 +924,69 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
 }
 
 /**
+ * Queue the pages for transmission, e.g. a request from postcopy destination
+ *   ms: MigrationStatus in which the queue is held
+ *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
+ *   start: Offset from the start of the RAMBlock
+ *   len: Length (in bytes) to send
+ *   Return: 0 on success
+ */
+int ram_save_queue_pages(MigrationState *ms, const char *rbname,
+                         ram_addr_t start, ram_addr_t len)
+{
+    RAMBlock *ramblock;
+
+    rcu_read_lock();
+    if (!rbname) {
+        /* Reuse last RAMBlock */
+        ramblock = ms->last_req_rb;
+
+        if (!ramblock) {
+            /*
+             * Shouldn't happen, we can't reuse the last RAMBlock if
+             * it's the 1st request.
+             */
+            error_report("ram_save_queue_pages no previous block");
+            goto err;
+        }
+    } else {
+        ramblock = ram_find_block(rbname);
+
+        if (!ramblock) {
+            /* We shouldn't be asked for a non-existent RAMBlock */
+            error_report("ram_save_queue_pages no block '%s'", rbname);
+            goto err;
+        }
+    }
+    trace_ram_save_queue_pages(ramblock->idstr, start, len);
+    if (start+len > ramblock->used_length) {
+        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
+                     __func__, start, len, ramblock->used_length);
+        goto err;
+    }
+
+    struct MigrationSrcPageRequest *new_entry =
+        g_malloc0(sizeof(struct MigrationSrcPageRequest));
+    new_entry->rb = ramblock;
+    new_entry->offset = start;
+    new_entry->len = len;
+    ms->last_req_rb = ramblock;
+
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    memory_region_ref(ramblock->mr);
+    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+    rcu_read_unlock();
+
+    return 0;
+
+err:
+    rcu_read_unlock();
+    return -1;
+}
+
+
+/**
  * ram_find_and_save_block: Finds a dirty page and sends it to f
  *
  * Called within an RCU critical section.
diff --git a/trace-events b/trace-events
index 328d64c..04f6682 100644
--- a/trace-events
+++ b/trace-events
@@ -1232,6 +1232,7 @@ migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
 ram_postcopy_send_discard_bitmap(void) ""
+ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
 
 # hw/display/qxl.c
 disable qxl_interface_set_mm_time(int qid, uint32_t mm_time) "%d %d"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (30 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14  9:40   ` Juan Quintela
  2015-07-27  6:05   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
                   ` (10 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When transmitting RAM pages, consume pages that have been queued by
MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

Note:
  a) After a queued page the linear walk carries on from after the
unqueued page; there is a reasonable chance that the destination
was about to ask for other closeby pages anyway.

  b) We have to be careful of any assumptions that the page walking
code makes, in particular it does some short cuts on its first linear
walk that break as soon as we do a queued page.

  c) We have to be careful to not break up host-page size chunks, since
this makes it harder to place the pages on the destination.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 227 ++++++++++++++++++++++++++++++++++++++++++++------------
 trace-events    |   2 +
 2 files changed, 183 insertions(+), 46 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index da3e9ea..316834b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -222,6 +222,7 @@ static RAMBlock *last_seen_block;
 /* This is the last block from where we have sent data */
 static RAMBlock *last_sent_block;
 static ram_addr_t last_offset;
+static bool last_was_from_queue;
 static unsigned long *migration_bitmap;
 static uint64_t migration_dirty_pages;
 static uint32_t last_version;
@@ -503,9 +504,9 @@ static int save_xbzrle_page(QEMUFile *f, uint8_t **current_data,
  * Returns: byte offset within memory region of the start of a dirty page
  */
 static inline
-ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
-                                                 ram_addr_t start,
-                                                 ram_addr_t *ram_addr_abs)
+ram_addr_t migration_bitmap_find_dirty(MemoryRegion *mr,
+                                       ram_addr_t start,
+                                       ram_addr_t *ram_addr_abs)
 {
     unsigned long base = mr->ram_addr >> TARGET_PAGE_BITS;
     unsigned long nr = base + (start >> TARGET_PAGE_BITS);
@@ -520,14 +521,23 @@ ram_addr_t migration_bitmap_find_and_reset_dirty(MemoryRegion *mr,
         next = find_next_bit(migration_bitmap, size, nr);
     }
 
-    if (next < size) {
-        clear_bit(next, migration_bitmap);
-        migration_dirty_pages--;
-    }
     *ram_addr_abs = next << TARGET_PAGE_BITS;
     return (next - base) << TARGET_PAGE_BITS;
 }
 
+static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
+{
+    bool ret;
+    int nr = addr >> TARGET_PAGE_BITS;
+
+    ret = test_and_clear_bit(nr, migration_bitmap);
+
+    if (ret) {
+        migration_dirty_pages--;
+    }
+    return ret;
+}
+
 static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
 {
     migration_dirty_pages +=
@@ -923,6 +933,41 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
     return pages;
 }
 
+/*
+ * Unqueue a page from the queue fed by postcopy page requests
+ *
+ * Returns:      The RAMBlock* to transmit from (or NULL if the queue is empty)
+ *      ms:      MigrationState in
+ *  offset:      the byte offset within the RAMBlock for the start of the page
+ * ram_addr_abs: global offset in the dirty/sent bitmaps
+ */
+static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
+                                       ram_addr_t *ram_addr_abs)
+{
+    RAMBlock *result = NULL;
+    qemu_mutex_lock(&ms->src_page_req_mutex);
+    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
+        struct MigrationSrcPageRequest *entry =
+                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
+        result = entry->rb;
+        *offset = entry->offset;
+        *ram_addr_abs = (entry->offset + entry->rb->offset) & TARGET_PAGE_MASK;
+
+        if (entry->len > TARGET_PAGE_SIZE) {
+            entry->len -= TARGET_PAGE_SIZE;
+            entry->offset += TARGET_PAGE_SIZE;
+        } else {
+            memory_region_unref(result->mr);
+            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
+            g_free(entry);
+        }
+    }
+    qemu_mutex_unlock(&ms->src_page_req_mutex);
+
+    return result;
+}
+
+
 /**
  * Queue the pages for transmission, e.g. a request from postcopy destination
  *   ms: MigrationStatus in which the queue is held
@@ -987,6 +1032,58 @@ err:
 
 
 /**
+ * ram_save_host_page: Starting at *offset send pages upto the end
+ *                     of the current host page.  It's valid for the initial
+ *                     offset to point into the middle of a host page
+ *                     in which case the remainder of the hostpage is sent.
+ *                     Only dirty target pages are sent.
+ *
+ * Returns: Number of pages written.
+ *
+ * @f: QEMUFile where to send the data
+ * @block: pointer to block that contains the page we want to send
+ * @offset: offset inside the block for the page; updated to last target page
+ *          sent
+ * @last_stage: if we are at the completion stage
+ * @bytes_transferred: increase it with the number of transferred bytes
+ */
+static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock* block,
+                              ram_addr_t *offset, bool last_stage,
+                              uint64_t *bytes_transferred,
+                              ram_addr_t dirty_ram_abs)
+{
+    int tmppages, pages = 0;
+    do {
+        /* Check the pages is dirty and if it is send it */
+        if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
+            if (compression_switch && migrate_use_compression()) {
+                tmppages = ram_save_compressed_page(f, block, *offset,
+                                                    last_stage,
+                                                    bytes_transferred);
+            } else {
+                tmppages = ram_save_page(f, block, *offset, last_stage,
+                                         bytes_transferred);
+            }
+
+            if (tmppages < 0) {
+                return tmppages;
+            } else {
+                if (ms->sentmap) {
+                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
+                }
+            }
+            pages += tmppages;
+        }
+        *offset += TARGET_PAGE_SIZE;
+        dirty_ram_abs += TARGET_PAGE_SIZE;
+    } while (*offset & (qemu_host_page_size - 1));
+
+    /* The offset we leave with is the last one we looked at */
+    *offset -= TARGET_PAGE_SIZE;
+    return pages;
+}
+
+/**
  * ram_find_and_save_block: Finds a dirty page and sends it to f
  *
  * Called within an RCU critical section.
@@ -997,65 +1094,102 @@ err:
  * @f: QEMUFile where to send the data
  * @last_stage: if we are at the completion stage
  * @bytes_transferred: increase it with the number of transferred bytes
+ *
+ * On systems where host-page-size > target-page-size it will send all the
+ * pages in a host page that are dirty.
  */
 
 static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
                                    uint64_t *bytes_transferred)
 {
+    MigrationState *ms = migrate_get_current();
     RAMBlock *block = last_seen_block;
+    RAMBlock *tmpblock;
     ram_addr_t offset = last_offset;
+    ram_addr_t tmpoffset;
     bool complete_round = false;
     int pages = 0;
-    MemoryRegion *mr;
     ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
                                  ram_addr_t space */
 
-    if (!block)
+    if (!block) {
         block = QLIST_FIRST_RCU(&ram_list.blocks);
+        last_was_from_queue = false;
+    }
 
-    while (true) {
-        mr = block->mr;
-        offset = migration_bitmap_find_and_reset_dirty(mr, offset,
-                                                       &dirty_ram_abs);
-        if (complete_round && block == last_seen_block &&
-            offset >= last_offset) {
-            break;
-        }
-        if (offset >= block->used_length) {
-            offset = 0;
-            block = QLIST_NEXT_RCU(block, next);
-            if (!block) {
-                block = QLIST_FIRST_RCU(&ram_list.blocks);
-                complete_round = true;
-                ram_bulk_stage = false;
-                if (migrate_use_xbzrle()) {
-                    /* If xbzrle is on, stop using the data compression at this
-                     * point. In theory, xbzrle can do better than compression.
-                     */
-                    flush_compressed_data(f);
-                    compression_switch = false;
-                }
+    while (true) { /* Until we send a block or run out of stuff to send */
+        tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &dirty_ram_abs);
+
+        if (tmpblock) {
+            /* We've got a block from the postcopy queue */
+            trace_ram_find_and_save_block_postcopy(tmpblock->idstr,
+                                                   (uint64_t)tmpoffset,
+                                                   (uint64_t)dirty_ram_abs);
+            /*
+             * We're sending this page, and since it's postcopy nothing else
+             * will dirty it, and we must make sure it doesn't get sent again
+             * even if this queue request was received after the background
+             * search already sent it.
+             */
+            if (!test_bit(dirty_ram_abs >> TARGET_PAGE_BITS,
+                          migration_bitmap)) {
+                trace_ram_find_and_save_block_postcopy_not_dirty(
+                    tmpblock->idstr, (uint64_t)tmpoffset,
+                    (uint64_t)dirty_ram_abs,
+                    test_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap));
+
+                continue;
             }
+            /*
+             * As soon as we start servicing pages out of order, then we have
+             * to kill the bulk stage, since the bulk stage assumes
+             * in (migration_bitmap_find_and_reset_dirty) that every page is
+             * dirty, that's no longer true.
+             */
+            ram_bulk_stage = false;
+            /*
+             * We want the background search to continue from the queued page
+             * since the guest is likely to want other pages near to the page
+             * it just requested.
+             */
+            block = tmpblock;
+            offset = tmpoffset;
         } else {
-            if (compression_switch && migrate_use_compression()) {
-                pages = ram_save_compressed_page(f, block, offset, last_stage,
-                                                 bytes_transferred);
-            } else {
-                pages = ram_save_page(f, block, offset, last_stage,
-                                      bytes_transferred);
+            MemoryRegion *mr;
+            /* priority queue empty, so just search for something dirty */
+            mr = block->mr;
+            offset = migration_bitmap_find_dirty(mr, offset, &dirty_ram_abs);
+            if (complete_round && block == last_seen_block &&
+                offset >= last_offset) {
+                break;
             }
-
-            /* if page is unmodified, continue to the next */
-            if (pages > 0) {
-                MigrationState *ms = migrate_get_current();
-                if (ms->sentmap) {
-                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
+            if (offset >= block->used_length) {
+                offset = 0;
+                block = QLIST_NEXT_RCU(block, next);
+                if (!block) {
+                    block = QLIST_FIRST_RCU(&ram_list.blocks);
+                    complete_round = true;
+                    ram_bulk_stage = false;
+                    if (migrate_use_xbzrle()) {
+                        /* If xbzrle is on, stop using the data compression at
+                         * this point. In theory, xbzrle can do better than
+                         * compression.
+                         */
+                        flush_compressed_data(f);
+                        compression_switch = false;
+                    }
                 }
-
-                last_sent_block = block;
-                break;
+                continue; /* pick an offset in the new block */
             }
         }
+
+        pages = ram_save_host_page(ms, f, block, &offset, last_stage,
+                                   bytes_transferred, dirty_ram_abs);
+
+        /* if page is unmodified, continue to the next */
+        if (pages > 0) {
+            break;
+        }
     }
 
     last_seen_block = block;
@@ -1148,6 +1282,7 @@ static void reset_ram_globals(void)
     last_offset = 0;
     last_version = ram_list.version;
     ram_bulk_stage = true;
+    last_was_from_queue = false;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
diff --git a/trace-events b/trace-events
index 04f6682..cb707c7 100644
--- a/trace-events
+++ b/trace-events
@@ -1231,6 +1231,8 @@ qemu_file_fclose(void) ""
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64""
 migration_throttle(void) ""
+ram_find_and_save_block_postcopy(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr) "%s/%" PRIx64 " ram_addr=%" PRIx64
+ram_find_and_save_block_postcopy_not_dirty(const char *block_name, uint64_t tmp_offset, uint64_t ram_addr, int sent) "%s/%" PRIx64 " ram_addr=%" PRIx64 " (sent=%d)"
 ram_postcopy_send_discard_bitmap(void) ""
 ram_save_queue_pages(const char *rbname, size_t start, size_t len) "%s: start: %zx len: %zx"
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (31 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 10:05   ` Juan Quintela
  2015-07-27  6:11   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
                   ` (9 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

postcopy_place_page (etc) provide a way for postcopy to place a page
into guests memory atomically (using the copy ioctl on the ufd).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h    |  1 +
 include/migration/postcopy-ram.h | 16 ++++++++
 migration/postcopy-ram.c         | 87 ++++++++++++++++++++++++++++++++++++++++
 trace-events                     |  1 +
 4 files changed, 105 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 20d1b39..8d2e5c8 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -79,6 +79,7 @@ struct MigrationIncomingState {
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyState postcopy_state;
+    void          *postcopy_tmp_page;
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 8a8616b..bac79c5 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -69,4 +69,20 @@ void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
 void postcopy_discard_send_finish(MigrationState *ms,
                                   PostcopyDiscardState *pds);
 
+/*
+ * Place a page (from) at (host) efficiently
+ *    There are restrictions on how 'from' must be mapped, in general best
+ *    to use other postcopy_ routines to allocate.
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        bool all_zero);
+
+/*
+ * Allocate a page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * Returns: Pointer to allocated page
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis);
+
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 7158d08..6345480 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -279,6 +279,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         return -1;
     }
 
+    if (mis->postcopy_tmp_page) {
+        munmap(mis->postcopy_tmp_page, getpagesize());
+        mis->postcopy_tmp_page = NULL;
+    }
     return 0;
 }
 
@@ -345,6 +349,77 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
     return 0;
 }
 
+/*
+ * Place a host page (from) at (host) atomically
+ * all_zero: Hint that the page being placed is 0 throughout
+ * returns 0 on success
+ */
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        bool all_zero)
+{
+    if (!all_zero) {
+        struct uffdio_copy copy_struct;
+
+        copy_struct.dst = (uint64_t)(uintptr_t)host;
+        copy_struct.src = (uint64_t)(uintptr_t)from;
+        copy_struct.len = getpagesize();
+        copy_struct.mode = 0;
+
+        /* copy also acks to the kernel waking the stalled thread up
+         * TODO: We can inhibit that ack and only do it if it was requested
+         * which would be slightly cheaper, but we'd have to be careful
+         * of the order of updating our page state.
+         */
+        if (ioctl(mis->userfault_fd, UFFDIO_COPY, &copy_struct)) {
+            int e = errno;
+            error_report("%s: %s copy host: %p from: %p",
+                         __func__, strerror(e), host, from);
+
+            return -e;
+        }
+    } else {
+        struct uffdio_zeropage zero_struct;
+
+        zero_struct.range.start = (uint64_t)(uintptr_t)host;
+        zero_struct.range.len = getpagesize();
+        zero_struct.mode = 0;
+
+        if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
+            int e = errno;
+            error_report("%s: %s zero host: %p from: %p",
+                         __func__, strerror(e), host, from);
+
+            return -e;
+        }
+    }
+
+    trace_postcopy_place_page(host, all_zero);
+    return 0;
+}
+
+/*
+ * Returns a target page of memory that can be mapped at a later point in time
+ * using postcopy_place_page
+ * The same address is used repeatedly, postcopy_place_page just takes the
+ * backing page away.
+ * Returns: Pointer to allocated page
+ *
+ */
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    if (!mis->postcopy_tmp_page) {
+        mis->postcopy_tmp_page = mmap(NULL, getpagesize(),
+                             PROT_READ | PROT_WRITE, MAP_PRIVATE |
+                             MAP_ANONYMOUS, -1, 0);
+        if (!mis->postcopy_tmp_page) {
+            error_report("%s: %s", __func__, strerror(errno));
+            return NULL;
+        }
+    }
+
+    return mis->postcopy_tmp_page;
+}
+
 #else
 /* No target OS support, stubs just fail */
 bool postcopy_ram_supported_by_host(void)
@@ -374,6 +449,18 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
     assert(0);
 }
+
+int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
+                        bool all_zero)
+{
+    assert(0);
+}
+
+void *postcopy_get_tmp_page(MigrationIncomingState *mis)
+{
+    assert(0);
+}
+
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/trace-events b/trace-events
index cb707c7..d9c5a51 100644
--- a/trace-events
+++ b/trace-events
@@ -1515,6 +1515,7 @@ postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s ma
 postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_ram_discard_range(void *start, void *end) "%p,%p"
 postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
+postcopy_place_page(void *host_addr, bool all_zero) "host=%p all_zero=%d"
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (32 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 12:34   ` Juan Quintela
  2015-07-27  7:39   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
                   ` (8 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In postcopy, the destination guest is running at the same time
as it's receiving pages; as we receive new pages we must put
them into the guests address space atomically to avoid a running
CPU accessing a partially written page.

Use the helpers in postcopy-ram.c to map these pages.

qemu_get_buffer_less_copy is used to avoid a copy out of qemu_file
in the case that postcopy is going to do a copy anyway.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 103 insertions(+), 21 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 316834b..01a0ab4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1729,7 +1729,17 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, void *host)
 /* Must be called from within a rcu critical section.
  * Returns a pointer from within the RCU-protected ram_list.
  */
+/*
+ * Read a RAMBlock ID from the stream f, find the host address of the
+ * start of that block and add on 'offset'
+ *
+ * f: Stream to read from
+ * mis: MigrationIncomingState
+ * offset: Offset within the block
+ * flags: Page flags (mostly to see if it's a continuation of previous block)
+ */
 static inline void *host_from_stream_offset(QEMUFile *f,
+                                            MigrationIncomingState *mis,
                                             ram_addr_t offset,
                                             int flags)
 {
@@ -1742,7 +1752,6 @@ static inline void *host_from_stream_offset(QEMUFile *f,
             error_report("Ack, bad migration stream!");
             return NULL;
         }
-
         return memory_region_get_ram_ptr(block->mr) + offset;
     }
 
@@ -1881,6 +1890,16 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     int flags = 0, ret = 0;
     static uint64_t seq_iter;
     int len = 0;
+    /*
+     * System is running in postcopy mode, page inserts to host memory must be
+     * atomic
+     */
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    bool postcopy_running = postcopy_state_get(mis) >=
+                            POSTCOPY_INCOMING_LISTENING;
+    void *postcopy_host_page = NULL;
+    bool postcopy_place_needed = false;
+    bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
 
     seq_iter++;
 
@@ -1896,13 +1915,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     rcu_read_lock();
     while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
         ram_addr_t addr, total_ram_bytes;
-        void *host;
+        void *host = 0;
+        void *page_buffer = 0;
+        void *postcopy_place_source = 0;
         uint8_t ch;
+        bool all_zero = false;
 
         addr = qemu_get_be64(f);
         flags = addr & ~TARGET_PAGE_MASK;
         addr &= TARGET_PAGE_MASK;
 
+        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
+                     RAM_SAVE_FLAG_XBZRLE)) {
+            host = host_from_stream_offset(f, mis, addr, flags);
+            if (!host) {
+                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
+                ret = -EINVAL;
+                break;
+            }
+            if (!postcopy_running) {
+                page_buffer = host;
+            } else {
+                /*
+                 * Postcopy requires that we place whole host pages atomically.
+                 * To make it atomic, the data is read into a temporary page
+                 * that's moved into place later.
+                 * The migration protocol uses,  possibly smaller, target-pages
+                 * however the source ensures it always sends all the components
+                 * of a host page in order.
+                 */
+                if (!postcopy_host_page) {
+                    postcopy_host_page = postcopy_get_tmp_page(mis);
+                }
+                page_buffer = postcopy_host_page +
+                              ((uintptr_t)host & ~qemu_host_page_mask);
+                /* If all TP are zero then we can optimise the place */
+                if (!((uintptr_t)host & ~qemu_host_page_mask)) {
+                    all_zero = true;
+                }
+
+                /*
+                 * If it's the last part of a host page then we place the host
+                 * page
+                 */
+                postcopy_place_needed = (((uintptr_t)host + TARGET_PAGE_SIZE) &
+                                         ~qemu_host_page_mask) == 0;
+                postcopy_place_source = postcopy_host_page;
+            }
+        } else {
+            postcopy_place_needed = false;
+        }
+
         switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
         case RAM_SAVE_FLAG_MEM_SIZE:
             /* Synchronize RAM block list */
@@ -1941,26 +2004,37 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             }
             break;
         case RAM_SAVE_FLAG_COMPRESS:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
-            }
             ch = qemu_get_byte(f);
-            ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            if (!postcopy_running) {
+                ram_handle_compressed(host, ch, TARGET_PAGE_SIZE);
+            } else {
+                memset(page_buffer, ch, TARGET_PAGE_SIZE);
+                if (ch) {
+                    all_zero = false;
+                }
+            }
             break;
+
         case RAM_SAVE_FLAG_PAGE:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
+            all_zero = false;
+            if (!postcopy_place_needed || !matching_page_sizes) {
+                qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
+            } else {
+                /* Avoids the qemu_file copy during postcopy, which is
+                 * going to do a copy later; can only do it when we
+                 * do this read in one go (matching page sizes)
+                 */
+                qemu_get_buffer_less_copy(f, (uint8_t **)&postcopy_place_source,
+                                          TARGET_PAGE_SIZE);
             }
-            qemu_get_buffer(f, host, TARGET_PAGE_SIZE);
             break;
         case RAM_SAVE_FLAG_COMPRESS_PAGE:
-            host = host_from_stream_offset(f, addr, flags);
+            all_zero = false;
+            if (postcopy_running) {
+                error_report("Compressed RAM in postcopy mode @%zx\n", addr);
+                return -EINVAL;
+            }
+            host = host_from_stream_offset(f, mis, addr, flags);
             if (!host) {
                 error_report("Invalid RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
@@ -1976,12 +2050,12 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
             qemu_get_buffer(f, compressed_data_buf, len);
             decompress_data_with_multi_threads(compressed_data_buf, host, len);
             break;
+
         case RAM_SAVE_FLAG_XBZRLE:
-            host = host_from_stream_offset(f, addr, flags);
-            if (!host) {
-                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                ret = -EINVAL;
-                break;
+            all_zero = false;
+            if (postcopy_running) {
+                error_report("XBZRLE RAM block in postcopy mode @%zx\n", addr);
+                return -EINVAL;
             }
             if (load_xbzrle(f, addr, host) < 0) {
                 error_report("Failed to decompress XBZRLE page at "
@@ -2002,6 +2076,14 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                 ret = -EINVAL;
             }
         }
+
+        if (postcopy_place_needed) {
+            /* This gets called at the last target page in the host page */
+            ret = postcopy_place_page(mis, host + TARGET_PAGE_SIZE -
+                                           qemu_host_page_size,
+                                      postcopy_place_source,
+                                      all_zero);
+        }
         if (!ret) {
             ret = qemu_file_get_error(f);
         }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (33 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 12:36   ` Juan Quintela
  2015-07-27  7:43   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 36/42] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
                   ` (7 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Once we're in postcopy the source processors are stopped and memory
shouldn't change any more, so there's no need to look at the dirty
map.

There are two notes to this:
  1) If we do resync and a page had changed then the page would get
     sent again, which the destination wouldn't allow (since it might
     have also modified the page)
  2) Before disabling this I'd seen very rare cases where a page had been
     marked dirtied although the memory contents are apparently identical

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 migration/ram.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 01a0ab4..5cff4d6 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 {
     rcu_read_lock();
 
-    migration_bitmap_sync();
+    if (!migration_postcopy_phase(migrate_get_current())) {
+        migration_bitmap_sync();
+    }
 
     ram_control_before_iterate(f, RAM_CONTROL_FINISH);
 
@@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
 
     remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
 
-    if (remaining_size < max_size) {
+    if (!migration_postcopy_phase(migrate_get_current()) &&
+        remaining_size < max_size) {
         qemu_mutex_lock_iothread();
         rcu_read_lock();
         migration_bitmap_sync();
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 36/42] Host page!=target page: Cleanup bitmaps
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (34 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 15:01   ` Juan Quintela
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
                   ` (6 subsequent siblings)
  42 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Prior to the start of postcopy, ensure that everything that will
be transferred later is a whole host-page in size.

This is accomplished by discarding partially transferred host pages
and marking any that are partially dirty as fully dirty.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/ram.c | 267 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 267 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 5cff4d6..a8a25aa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1396,6 +1396,265 @@ static int postcopy_each_ram_send_discard(MigrationState *ms)
 }
 
 /*
+ * Helper for postcopy_chunk_hostpages where HPS/TPS >= bits-in-long
+ *
+ * !! Untested !!
+ */
+static int hostpage_big_chunk_helper(const char *block_name, void *host_addr,
+                                     ram_addr_t offset, ram_addr_t length,
+                                     void *opaque)
+{
+    MigrationState *ms = opaque;
+    unsigned long long_bits = sizeof(long) * 8;
+    unsigned int host_len = (qemu_host_page_size / TARGET_PAGE_SIZE) /
+                            long_bits;
+    unsigned long first_long, last_long, cur_long, current_hp;
+    unsigned long first = offset >> TARGET_PAGE_BITS;
+    unsigned long last = (offset + (length - 1)) >> TARGET_PAGE_BITS;
+
+    PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                           first,
+                                                           block_name);
+    first_long = first / long_bits;
+    last_long = last / long_bits;
+
+    /*
+     * I'm assuming RAMBlocks must start at the start of host pages,
+     * but I guess they might not use the whole of the host page
+     */
+
+    /* Work along one host page at a time */
+    for (current_hp = first_long; current_hp <= last_long;
+         current_hp += host_len) {
+        bool discard = 0;
+        bool redirty = 0;
+        bool has_some_dirty = false;
+        bool has_some_undirty = false;
+        bool has_some_sent = false;
+        bool has_some_unsent = false;
+
+        /*
+         * Check each long of mask for this hp, and see if anything
+         * needs updating.
+         */
+        for (cur_long = current_hp; cur_long < (current_hp + host_len);
+             cur_long++) {
+            /* a chunk of sent pages */
+            unsigned long sdata = ms->sentmap[cur_long];
+            /* a chunk of dirty pages */
+            unsigned long ddata = migration_bitmap[cur_long];
+
+            if (sdata) {
+                has_some_sent = true;
+            }
+            if (sdata != ~0ul) {
+                has_some_unsent = true;
+            }
+            if (ddata) {
+                has_some_dirty = true;
+            }
+            if (ddata != ~0ul) {
+                has_some_undirty = true;
+            }
+
+        }
+
+        if (has_some_sent && has_some_unsent) {
+            /* Partially sent host page */
+            discard = true;
+            redirty = true;
+        }
+
+        if (has_some_dirty && has_some_undirty) {
+            /* Partially dirty host page */
+            redirty = true;
+        }
+
+        if (!discard && !redirty) {
+            /* All consistent - next host page */
+            continue;
+        }
+
+
+        /* Now walk the chunks again, sending discards etc */
+        for (cur_long = current_hp; cur_long < (current_hp + host_len);
+             cur_long++) {
+            unsigned long cur_bits = cur_long * long_bits;
+
+            /* a chunk of sent pages */
+            unsigned long sdata = ms->sentmap[cur_long];
+            /* a chunk of dirty pages */
+            unsigned long ddata = migration_bitmap[cur_long];
+
+            if (discard && sdata) {
+                /* Tell the destination to discard these pages */
+                postcopy_discard_send_range(ms, pds, cur_bits,
+                                            cur_bits + long_bits - 1);
+                /* And clear them in the sent data structure */
+                ms->sentmap[cur_long] = 0;
+            }
+
+            if (redirty) {
+                migration_bitmap[cur_long] = ~0ul;
+                /* Inc the count of dirty pages */
+                migration_dirty_pages += ctpopl(~ddata);
+            }
+        }
+    }
+
+    postcopy_discard_send_finish(ms, pds);
+
+    return 0;
+}
+
+/*
+ * When working on long chunks of a bitmap where the only valid section
+ * is between start..end (inclusive), generate a mask with only those
+ * valid bits set for the current long word within that bitmask.
+ */
+static unsigned long make_long_mask(unsigned long start, unsigned long end,
+                                    unsigned long cur_long)
+{
+    unsigned long long_bits = sizeof(long) * 8;
+    unsigned long long_bits_mask = long_bits - 1;
+    unsigned long first_long, last_long;
+    unsigned long mask = ~(unsigned long)0;
+    first_long = start / long_bits ;
+    last_long = end / long_bits;
+
+    if ((cur_long == first_long) && (start & long_bits_mask)) {
+        /* e.g. (start & 31) = 3
+         *         1 << .    -> 2^3
+         *         . - 1     -> 2^3 - 1 i.e. mask 2..0
+         *         ~.        -> mask 31..3
+         */
+        mask &= ~((((unsigned long)1) << (start & long_bits_mask)) - 1);
+    }
+
+    if ((cur_long == last_long) && ((end & long_bits_mask) != long_bits_mask)) {
+        /* e.g. (end & 31) = 3
+         *            .   +1 -> 4
+         *         1 << .    -> 2^4
+         *         . -1      -> 2^4 - 1
+         *                   = mask set 3..0
+         */
+        mask &= (((unsigned long)1) << ((end & long_bits_mask) + 1)) - 1;
+    }
+
+    return mask;
+}
+
+/*
+ * Utility for the outgoing postcopy code.
+ *
+ * Discard any partially sent host-page size chunks, mark any partially
+ * dirty host-page size chunks as all dirty.
+ *
+ * Returns: 0 on success
+ */
+static int postcopy_chunk_hostpages(MigrationState *ms)
+{
+    struct RAMBlock *block;
+    unsigned int host_bits = qemu_host_page_size / TARGET_PAGE_SIZE;
+    unsigned long long_bits = sizeof(long) * 8;
+    unsigned long host_mask;
+
+    assert(is_power_of_2(host_bits));
+
+    if (qemu_host_page_size == TARGET_PAGE_SIZE) {
+        /* Easy case - TPS==HPS - nothing to be done */
+        return 0;
+    }
+
+    /* Easiest way to make sure we don't resume in the middle of a host-page */
+    last_seen_block = NULL;
+    last_sent_block = NULL;
+
+    /*
+     * The currently worst known ratio is ARM that has 1kB target pages, and
+     * can have 64kB host pages, which is thus inconveniently larger than a long
+     * on ARM (32bits), and a long is the underlying element of the migration
+     * bitmaps.
+     */
+    if (host_bits >= long_bits) {
+        /* Deal with the odd case separately */
+        return qemu_ram_foreach_block(hostpage_big_chunk_helper, ms);
+    } else {
+        host_mask =  (1ul << host_bits) - 1;
+    }
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        unsigned long first_long, last_long, cur_long;
+        unsigned long first = block->offset >> TARGET_PAGE_BITS;
+        unsigned long last = (block->offset + (block->used_length - 1))
+                                >> TARGET_PAGE_BITS;
+        PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
+                                                               first,
+                                                               block->idstr);
+
+        first_long = first / long_bits;
+        last_long = last / long_bits;
+        for (cur_long = first_long; cur_long <= last_long; cur_long++) {
+            unsigned long current_hp;
+            /* Deal with start/end not on alignment */
+            unsigned long mask = make_long_mask(first, last, cur_long);
+
+            /* a chunk of sent pages */
+            unsigned long sdata = ms->sentmap[cur_long];
+            /* a chunk of dirty pages */
+            unsigned long ddata = migration_bitmap[cur_long];
+            unsigned long discard = 0;
+            unsigned long redirty = 0;
+            sdata &= mask;
+            ddata &= mask;
+
+            for (current_hp = 0; current_hp < long_bits;
+                 current_hp += host_bits) {
+                unsigned long host_sent = (sdata >> current_hp) & host_mask;
+                unsigned long host_dirty = (ddata >> current_hp) & host_mask;
+
+                if (host_sent && (host_sent != host_mask)) {
+                    /* Partially sent host page */
+                    redirty |= host_mask << current_hp;
+                    discard |= host_mask << current_hp;
+
+                    /* Tell the destination to discard this page */
+                    postcopy_discard_send_range(ms, pds,
+                             cur_long * long_bits + current_hp /* start */,
+                             cur_long * long_bits + current_hp +
+                                 host_bits - 1 /* end */);
+                } else if (host_dirty && (host_dirty != host_mask)) {
+                    /* Partially dirty host page */
+                    redirty |= host_mask << current_hp;
+                }
+            }
+            if (discard) {
+                /* clear the page in the sentmap */
+                ms->sentmap[cur_long] &= ~discard;
+            }
+            if (redirty) {
+                /*
+                 * Reread original dirty bits and OR in ones we clear; we
+                 * must reread since we might be at the start or end of
+                 * a RAMBlock that the original 'mask' discarded some
+                 * bits from
+                */
+                ddata = migration_bitmap[cur_long];
+                migration_bitmap[cur_long] = ddata | redirty;
+                /* Inc the count of dirty pages */
+                migration_dirty_pages += ctpopl(redirty - (ddata & redirty));
+            }
+        }
+
+        postcopy_discard_send_finish(ms, pds);
+    }
+
+    rcu_read_unlock();
+    return 0;
+}
+
+/*
  * Transmit the set of pages to be discarded after precopy to the target
  * these are pages that have been sent previously but have been dirtied
  * Hopefully this is pretty sparse
@@ -1405,9 +1664,17 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
     int ret;
 
     rcu_read_lock();
+
     /* This should be our last sync, the src is now paused */
     migration_bitmap_sync();
 
+    /* Deal with TPS != HPS */
+    ret = postcopy_chunk_hostpages(ms);
+    if (ret) {
+        rcu_read_unlock();
+        return ret;
+    }
+
     /*
      * Update the sentmap to be  sentmap&=dirty
      */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (35 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 36/42] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 15:10   ` Juan Quintela
  2015-07-27 14:29   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 38/42] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
                   ` (5 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

userfaultfd is a Linux syscall that gives an fd that receives a stream
of notifications of accesses to pages registered with it and allows
the program to acknowledge those stalls and tell the accessing
thread to carry on.

We convert the requests from the kernel into messages back to the
source asking for the pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |   4 ++
 migration/postcopy-ram.c      | 155 +++++++++++++++++++++++++++++++++++++++---
 trace-events                  |   9 +++
 3 files changed, 159 insertions(+), 9 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8d2e5c8..4f954ca 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -71,11 +71,15 @@ struct MigrationIncomingState {
      */
     QemuEvent      main_thread_load_event;
 
+    bool           have_fault_thread;
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
     /* For the kernel to send us notifications */
     int            userfault_fd;
+    /* To tell the fault_thread to quit */
+    int            userfault_quit_fd;
+
     QEMUFile *return_path;
     QemuMutex      rp_mutex;    /* We send replies from multiple threads */
     PostcopyState postcopy_state;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 6345480..7eb1fb9 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -49,6 +49,8 @@ struct PostcopyDiscardState {
  */
 #if defined(__linux__)
 
+#include <poll.h>
+#include <sys/eventfd.h>
 #include <sys/mman.h>
 #include <sys/ioctl.h>
 #include <sys/syscall.h>
@@ -274,15 +276,41 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
  */
 int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
 {
-    /* TODO: Join the fault thread once we're sure it will exit */
-    if (qemu_ram_foreach_block(cleanup_area, mis)) {
-        return -1;
+    trace_postcopy_ram_incoming_cleanup_entry();
+
+    if (mis->have_fault_thread) {
+        uint64_t tmp64;
+
+        if (qemu_ram_foreach_block(cleanup_area, mis)) {
+            return -1;
+        }
+        /*
+         * Tell the fault_thread to exit, it's an eventfd that should
+         * currently be at 0, we're going to inc it to 1
+         */
+        tmp64 = 1;
+        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
+            trace_postcopy_ram_incoming_cleanup_join();
+            qemu_thread_join(&mis->fault_thread);
+        } else {
+            /* Not much we can do here, but may as well report it */
+            error_report("%s: incing userfault_quit_fd: %s", __func__,
+                         strerror(errno));
+        }
+        trace_postcopy_ram_incoming_cleanup_closeuf();
+        close(mis->userfault_fd);
+        close(mis->userfault_quit_fd);
+        mis->have_fault_thread = false;
     }
 
+    postcopy_state_set(mis, POSTCOPY_INCOMING_END);
+    migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
+
     if (mis->postcopy_tmp_page) {
         munmap(mis->postcopy_tmp_page, getpagesize());
         mis->postcopy_tmp_page = NULL;
     }
+    trace_postcopy_ram_incoming_cleanup_exit();
     return 0;
 }
 
@@ -321,31 +349,140 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
 static void *postcopy_ram_fault_thread(void *opaque)
 {
     MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
+    struct uffd_msg msg;
+    int ret;
+    size_t hostpagesize = getpagesize();
+    RAMBlock *rb = NULL;
+    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
 
-    fprintf(stderr, "postcopy_ram_fault_thread\n");
-    /* TODO: In later patch */
+    trace_postcopy_ram_fault_thread_entry();
     qemu_sem_post(&mis->fault_thread_sem);
-    while (1) {
-        /* TODO: In later patch */
-    }
 
+    while (true) {
+        ram_addr_t rb_offset;
+        ram_addr_t in_raspace;
+        struct pollfd pfd[2];
+
+        /*
+         * We're mainly waiting for the kernel to give us a faulting HVA,
+         * however we can be told to quit via userfault_quit_fd which is
+         * an eventfd
+         */
+        pfd[0].fd = mis->userfault_fd;
+        pfd[0].events = POLLIN;
+        pfd[0].revents = 0;
+        pfd[1].fd = mis->userfault_quit_fd;
+        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
+        pfd[1].revents = 0;
+
+        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
+            error_report("%s: userfault poll: %s", __func__, strerror(errno));
+            break;
+        }
+
+        if (pfd[1].revents) {
+            trace_postcopy_ram_fault_thread_quit();
+            break;
+        }
+
+        ret = read(mis->userfault_fd, &msg, sizeof(msg));
+        if (ret != sizeof(msg)) {
+            if (errno == EAGAIN) {
+                /*
+                 * if a wake up happens on the other thread just after
+                 * the poll, there is nothing to read.
+                 */
+                continue;
+            }
+            if (ret < 0) {
+                error_report("%s: Failed to read full userfault message: %s",
+                             __func__, strerror(errno));
+                break;
+            } else {
+                error_report("%s: Read %d bytes from userfaultfd expected %zd",
+                             __func__, ret, sizeof(msg));
+                break; /* Lost alignment, don't know what we'd read next */
+            }
+        }
+        if (msg.event != UFFD_EVENT_PAGEFAULT) {
+            error_report("%s: Read unexpected event %ud from userfaultfd",
+                         __func__, msg.event);
+            continue; /* It's not a page fault, shouldn't happen */
+        }
+
+        rb = qemu_ram_block_from_host(
+                 (void *)(uintptr_t)msg.arg.pagefault.address,
+                 true, &in_raspace, &rb_offset);
+        if (!rb) {
+            error_report("postcopy_ram_fault_thread: Fault outside guest: %"
+                         PRIx64, (uint64_t)msg.arg.pagefault.address);
+            break;
+        }
+
+        rb_offset &= ~(hostpagesize - 1);
+        trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
+                                                qemu_ram_get_idstr(rb),
+                                                rb_offset);
+
+        /*
+         * Send the request to the source - we want to request one
+         * of our host page sizes (which is >= TPS)
+         */
+        if (rb != last_rb) {
+            last_rb = rb;
+            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                     rb_offset, hostpagesize);
+        } else {
+            /* Save some space */
+            migrate_send_rp_req_pages(mis, NULL,
+                                     rb_offset, hostpagesize);
+        }
+    }
+    trace_postcopy_ram_fault_thread_exit();
     return NULL;
 }
 
 int postcopy_ram_enable_notify(MigrationIncomingState *mis)
 {
-    /* Create the fault handler thread and wait for it to be ready */
+    /* Open the fd for the kernel to give us userfaults */
+    mis->userfault_fd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+    if (mis->userfault_fd == -1) {
+        error_report("%s: Failed to open userfault fd: %s", __func__,
+                     strerror(errno));
+        return -1;
+    }
+
+    /*
+     * Although the host check already tested the API, we need to
+     * do the check again as an ABI handshake on the new fd.
+     */
+    if (!ufd_version_check(mis->userfault_fd)) {
+        return -1;
+    }
+
+    /* Now an eventfd we use to tell the fault-thread to quit */
+    mis->userfault_quit_fd = eventfd(0, EFD_CLOEXEC);
+    if (mis->userfault_quit_fd == -1) {
+        error_report("%s: Opening userfault_quit_fd: %s", __func__,
+                     strerror(errno));
+        close(mis->userfault_fd);
+        return -1;
+    }
+
     qemu_sem_init(&mis->fault_thread_sem, 0);
     qemu_thread_create(&mis->fault_thread, "postcopy/fault",
                        postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
     qemu_sem_wait(&mis->fault_thread_sem);
     qemu_sem_destroy(&mis->fault_thread_sem);
+    mis->have_fault_thread = true;
 
     /* Mark so that we get notified of accesses to unwritten areas */
     if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
         return -1;
     }
 
+    trace_postcopy_ram_enable_notify();
+
     return 0;
 }
 
diff --git a/trace-events b/trace-events
index d9c5a51..ab201f9 100644
--- a/trace-events
+++ b/trace-events
@@ -1516,6 +1516,15 @@ postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size
 postcopy_ram_discard_range(void *start, void *end) "%p,%p"
 postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
 postcopy_place_page(void *host_addr, bool all_zero) "host=%p all_zero=%d"
+postcopy_ram_enable_notify(void) ""
+postcopy_ram_fault_thread_entry(void) ""
+postcopy_ram_fault_thread_exit(void) ""
+postcopy_ram_fault_thread_quit(void) ""
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
+postcopy_ram_incoming_cleanup_closeuf(void) ""
+postcopy_ram_incoming_cleanup_entry(void) ""
+postcopy_ram_incoming_cleanup_exit(void) ""
+postcopy_ram_incoming_cleanup_join(void) ""
 
 # kvm-all.c
 kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 38/42] Start up a postcopy/listener thread ready for incoming page data
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (36 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 15:12   ` Juan Quintela
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 39/42] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
                   ` (4 subsequent siblings)
  42 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The loading of a device state (during postcopy) may access guest
memory that's still on the source machine and thus might need
a page fill; split off a separate thread that handles the incoming
page data so that the original incoming migration code can finish
off the device data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/migration/migration.h |  4 +++
 migration/migration.c         |  6 ++++
 migration/savevm.c            | 79 ++++++++++++++++++++++++++++++++++++++++++-
 trace-events                  |  2 ++
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 4f954ca..da6529d 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -75,6 +75,10 @@ struct MigrationIncomingState {
     QemuThread     fault_thread;
     QemuSemaphore  fault_thread_sem;
 
+    bool           have_listen_thread;
+    QemuThread     listen_thread;
+    QemuSemaphore  listen_thread_sem;
+
     /* For the kernel to send us notifications */
     int            userfault_fd;
     /* To tell the fault_thread to quit */
diff --git a/migration/migration.c b/migration/migration.c
index 7fa982e..e908b50 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1222,6 +1222,12 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
         goto fail;
     }
 
+    /*
+     * Make sure the receiver can get incoming pages before we send the rest
+     * of the state
+     */
+    qemu_savevm_send_postcopy_listen(fb);
+
     qemu_savevm_state_complete_precopy(fb);
     qemu_savevm_send_ping(fb, 3);
 
diff --git a/migration/savevm.c b/migration/savevm.c
index b87238a..258c551 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1314,6 +1314,65 @@ static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
     return 0;
 }
 
+/*
+ * Triggered by a postcopy_listen command; this thread takes over reading
+ * the input stream, leaving the main thread free to carry on loading the rest
+ * of the device state (from RAM).
+ * (TODO:This could do with being in a postcopy file - but there again it's
+ * just another input loop, not that postcopy specific)
+ */
+static void *postcopy_ram_listen_thread(void *opaque)
+{
+    QEMUFile *f = opaque;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int load_res;
+
+    qemu_sem_post(&mis->listen_thread_sem);
+    trace_postcopy_ram_listen_thread_start();
+
+    /*
+     * Because we're a thread and not a coroutine we can't yield
+     * in qemu_file, and thus we must be blocking now.
+     */
+    qemu_file_change_blocking(f, true);
+    load_res = qemu_loadvm_state_main(f, mis);
+    /* And non-blocking again so we don't block in any cleanup */
+    qemu_file_change_blocking(f, false);
+
+    trace_postcopy_ram_listen_thread_exit();
+    if (load_res < 0) {
+        error_report("%s: loadvm failed: %d", __func__, load_res);
+        qemu_file_set_error(f, load_res);
+    } else {
+        /*
+         * This looks good, but it's possible that the device loading in the
+         * main thread hasn't finished yet, and so we might not be in 'RUN'
+         * state yet; wait for the end of the main thread.
+         */
+        qemu_event_wait(&mis->main_thread_load_event);
+    }
+    postcopy_ram_incoming_cleanup(mis);
+    /*
+     * If everything has worked fine, then the main thread has waited
+     * for us to start, and we're the last use of the mis.
+     * (If something broke then qemu will have to exit anyway since it's
+     * got a bad migration state).
+     */
+    migration_incoming_state_destroy();
+
+    if (load_res < 0) {
+        /*
+         * If something went wrong then we have a bad state so exit;
+         * depending how far we got it might be possible at this point
+         * to leave the guest running and fire MCEs for pages that never
+         * arrived as a desperate recovery step.
+         */
+        exit(EXIT_FAILURE);
+    }
+
+    return NULL;
+}
+
 /* After this message we must be able to immediately receive postcopy data */
 static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
@@ -1333,7 +1392,20 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         return -1;
     }
 
-    /* TODO start up the postcopy listening thread */
+    if (mis->have_listen_thread) {
+        error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
+        return -1;
+    }
+
+    mis->have_listen_thread = true;
+    /* Start up the listening thread and wait for it to signal ready */
+    qemu_sem_init(&mis->listen_thread_sem, 0);
+    qemu_thread_create(&mis->listen_thread, "postcopy/listen",
+                       postcopy_ram_listen_thread, mis->file,
+                       QEMU_THREAD_JOINABLE);
+    qemu_sem_wait(&mis->listen_thread_sem);
+    qemu_sem_destroy(&mis->listen_thread_sem);
+
     return 0;
 }
 
@@ -1657,6 +1729,11 @@ int qemu_loadvm_state(QEMUFile *f)
     qemu_event_set(&mis->main_thread_load_event);
 
     trace_qemu_loadvm_state_post_main(ret);
+    if (mis->have_listen_thread) {
+        /* Listen thread still going, can't clean up yet */
+        return ret;
+    }
+
     if (ret == 0) {
         int file_error_after_eof = qemu_file_get_error(f);
 
diff --git a/trace-events b/trace-events
index ab201f9..a6c7839 100644
--- a/trace-events
+++ b/trace-events
@@ -1199,6 +1199,8 @@ loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
 loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
 loadvm_process_command_ping(uint32_t val) "%x"
+postcopy_ram_listen_thread_exit(void) ""
+postcopy_ram_listen_thread_start(void) ""
 qemu_savevm_send_postcopy_advise(void) ""
 qemu_savevm_send_postcopy_ram_discard(const char *id, uint16_t len) "%s: %ud"
 savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 39/42] postcopy: Wire up loadvm_postcopy_handle_ commands
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (37 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 38/42] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 15:14   ` Juan Quintela
  2015-07-28  5:53   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 40/42] End of migration for postcopy Dr. David Alan Gilbert (git)
                   ` (3 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up more of the handlers for the commands on the destination side,
in particular loadvm_postcopy_handle_run now has enough to start the
guest running.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/savevm.c | 29 ++++++++++++++++++++++++++++-
 trace-events       |  2 ++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 258c551..13dd6de 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1413,12 +1413,34 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
 {
     PostcopyState ps = postcopy_state_set(mis, POSTCOPY_INCOMING_RUNNING);
+    Error *local_err = NULL;
+
     trace_loadvm_postcopy_handle_run();
     if (ps != POSTCOPY_INCOMING_LISTENING) {
         error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
         return -1;
     }
 
+    /* TODO we should move all of this lot into postcopy_ram.c or a shared code
+     * in migration.c
+     */
+    cpu_synchronize_all_post_init();
+
+    qemu_announce_self();
+
+    /* Make sure all file formats flush their mutable metadata */
+    bdrv_invalidate_cache_all(&local_err);
+    if (local_err) {
+        qerror_report_err(local_err);
+        error_free(local_err);
+        return -1;
+    }
+
+    trace_loadvm_postcopy_handle_run_cpu_sync();
+    cpu_synchronize_all_post_init();
+
+    trace_loadvm_postcopy_handle_run_vmstart();
+
     if (autostart) {
         /* Hold onto your hats, starting the CPU */
         vm_start();
@@ -1427,7 +1449,12 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
         runstate_set(RUN_STATE_PAUSED);
     }
 
-    return 0;
+    /* We need to finish reading the stream from the package
+     * and also stop reading anything more from the stream that loaded the
+     * package (since it's now being read by the listener thread).
+     * LOADVM_QUIT will quit all the layers of nested loadvm loops.
+     */
+    return LOADVM_QUIT;
 }
 
 static int loadvm_process_command_simple_lencheck(const char *name,
diff --git a/trace-events b/trace-events
index a6c7839..6e9b32a 100644
--- a/trace-events
+++ b/trace-events
@@ -1194,6 +1194,8 @@ loadvm_handle_cmd_packaged_received(int ret) "%d"
 loadvm_postcopy_handle_advise(void) ""
 loadvm_postcopy_handle_listen(void) ""
 loadvm_postcopy_handle_run(void) ""
+loadvm_postcopy_handle_run_cpu_sync(void) ""
+loadvm_postcopy_handle_run_vmstart(void) ""
 loadvm_postcopy_ram_handle_discard(void) ""
 loadvm_postcopy_ram_handle_discard_end(void) ""
 loadvm_postcopy_ram_handle_discard_header(const char *ramid, uint16_t len) "%s: %ud"
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 40/42] End of migration for postcopy
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (38 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 39/42] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 15:15   ` Juan Quintela
  2015-07-28  5:55   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
                   ` (2 subsequent siblings)
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tweak the end of migration cleanup; we don't want to close stuff down
at the end of the main stream, since the postcopy is still sending pages
on the other thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 25 ++++++++++++++++++++++++-
 trace-events          |  2 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index e908b50..ea495af 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -179,12 +179,35 @@ static void process_incoming_migration_co(void *opaque)
 {
     QEMUFile *f = opaque;
     Error *local_err = NULL;
+    MigrationIncomingState *mis;
+    PostcopyState ps;
     int ret;
 
-    migration_incoming_state_new(f);
+    mis = migration_incoming_state_new(f);
 
     ret = qemu_loadvm_state(f);
 
+    ps = postcopy_state_get(mis);
+    trace_process_incoming_migration_co_end(ret, ps);
+    if (ps != POSTCOPY_INCOMING_NONE) {
+        if (ps == POSTCOPY_INCOMING_ADVISE) {
+            /*
+             * Where a migration had postcopy enabled (and thus went to advise)
+             * but managed to complete within the precopy period, we can use
+             * the normal exit.
+             */
+            postcopy_ram_incoming_cleanup(mis);
+        } else if (ret >= 0) {
+            /*
+             * Postcopy was started, cleanup should happen at the end of the
+             * postcopy thread.
+             */
+            trace_process_incoming_migration_co_postcopy_end_main();
+            return;
+        }
+        /* Else if something went wrong then just fall out of the normal exit */
+    }
+
     qemu_fclose(f);
     free_xbzrle_decoded_buf();
     migration_incoming_state_destroy();
diff --git a/trace-events b/trace-events
index 6e9b32a..c58c582 100644
--- a/trace-events
+++ b/trace-events
@@ -1451,6 +1451,8 @@ source_return_path_thread_loop_top(void) ""
 source_return_path_thread_pong(uint32_t val) "%x"
 source_return_path_thread_shut(uint32_t val) "%x"
 migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
+process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
+process_incoming_migration_co_postcopy_end_main(void) ""
 
 # migration/rdma.c
 qemu_dma_accept_incoming_migration(void) ""
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (39 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 40/42] End of migration for postcopy Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 15:22   ` Juan Quintela
  2015-07-28  6:02   ` Amit Shah
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
  2015-07-28  6:21 ` [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Amit Shah
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Userfault doesn't work with mlock; mlock is designed to nail down pages
so they don't move, userfault is designed to tell you when they're not
there.

munlock the pages we userfault protect before postcopy.
mlock everything again at the end if mlock is enabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/sysemu/sysemu.h  |  1 +
 migration/postcopy-ram.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 1af2ea0..c1f3da4 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -171,6 +171,7 @@ extern int boot_menu;
 extern bool boot_strict;
 extern uint8_t *boot_splash_filedata;
 extern size_t boot_splash_filedata_size;
+extern bool enable_mlock;
 extern uint8_t qemu_extra_params_fw[2];
 extern QEMUClockType rtc_clock;
 extern const char *mem_path;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 7eb1fb9..be7e5f2 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -85,6 +85,11 @@ static bool ufd_version_check(int ufd)
     return true;
 }
 
+/*
+ * Note: This has the side effect of munlock'ing all of RAM, that's
+ * normally fine since if the postcopy succeeds it gets turned back on at the
+ * end.
+ */
 bool postcopy_ram_supported_by_host(void)
 {
     long pagesize = getpagesize();
@@ -113,6 +118,15 @@ bool postcopy_ram_supported_by_host(void)
     }
 
     /*
+     * userfault and mlock don't go together; we'll put it back later if
+     * it was enabled.
+     */
+    if (munlockall()) {
+        error_report("%s: munlockall: %s", __func__,  strerror(errno));
+        return -1;
+    }
+
+    /*
      *  We need to check that the ops we need are supported on anon memory
      *  To do that we need to register a chunk and see the flags that
      *  are returned.
@@ -303,6 +317,16 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         mis->have_fault_thread = false;
     }
 
+    if (enable_mlock) {
+        if (os_mlock() < 0) {
+            error_report("mlock: %s", strerror(errno));
+            /*
+             * It doesn't feel right to fail at this point, we have a valid
+             * VM state.
+             */
+        }
+    }
+
     postcopy_state_set(mis, POSTCOPY_INCOMING_END);
     migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (40 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
@ 2015-06-16 10:26 ` Dr. David Alan Gilbert (git)
  2015-07-14 15:24   ` Juan Quintela
  2015-07-28  6:15   ` Amit Shah
  2015-07-28  6:21 ` [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Amit Shah
  42 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2015-06-16 10:26 UTC (permalink / raw)
  To: qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Postcopy detects accesses to pages that haven't been transferred yet
using userfaultfd, and it causes exceptions on pages that are 'not
present'.
Ballooning also causes pages to be marked as 'not present' when the
guest inflates the balloon.
Potentially a balloon could be inflated to discard pages that are
currently inflight during postcopy and that may be arriving at about
the same time.

To avoid this confusion, disable ballooning during postcopy.

When disabled we drop balloon requests from the guest.  Since ballooning
is generally initiated by the host, the management system should avoid
initiating any balloon instructions to the guest during migration,
although it's not possible to know how long it would take a guest to
process a request made prior to the start of migration.

Queueing the requests until after migration would be nice, but is
non-trivial, since the set of inflate/deflate requests have to
be compared with the state of the page to know what the final
outcome is allowed to be.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 balloon.c                  | 11 +++++++++++
 hw/virtio/virtio-balloon.c |  4 +++-
 include/sysemu/balloon.h   |  2 ++
 migration/postcopy-ram.c   |  9 +++++++++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/balloon.c b/balloon.c
index c7033e3..80a8280 100644
--- a/balloon.c
+++ b/balloon.c
@@ -35,6 +35,17 @@
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
 static void *balloon_opaque;
+static bool balloon_inhibited;
+
+bool qemu_balloon_is_inhibited(void)
+{
+    return balloon_inhibited;
+}
+
+void qemu_balloon_inhibit(bool state)
+{
+    balloon_inhibited = state;
+}
 
 static bool have_balloon(Error **errp)
 {
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 78bc14f..7d7170a 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -37,9 +37,11 @@
 static void balloon_page(void *addr, int deflate)
 {
 #if defined(__linux__)
-    if (!kvm_enabled() || kvm_has_sync_mmu())
+    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
+                                         kvm_has_sync_mmu())) {
         qemu_madvise(addr, TARGET_PAGE_SIZE,
                 deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+    }
 #endif
 }
 
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 0345e01..6851d99 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -23,5 +23,7 @@ typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
 int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
 			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
+bool qemu_balloon_is_inhibited(void);
+void qemu_balloon_inhibit(bool state);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index be7e5f2..4670cf0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -24,6 +24,7 @@
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/balloon.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 
@@ -317,6 +318,8 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         mis->have_fault_thread = false;
     }
 
+    qemu_balloon_inhibit(false);
+
     if (enable_mlock) {
         if (os_mlock() < 0) {
             error_report("mlock: %s", strerror(errno));
@@ -505,6 +508,12 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
         return -1;
     }
 
+    /*
+     * Ballooning can mark pages as absent while we're postcopying
+     * that would cause false userfaults.
+     */
+    qemu_balloon_inhibit(true);
+
     trace_postcopy_ram_enable_notify();
 
     return 0;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
@ 2015-06-16 15:43   ` Eric Blake
  2015-06-16 15:58     ` Dr. David Alan Gilbert
  2015-07-13 10:35   ` Juan Quintela
  2015-07-15  9:40   ` Amit Shah
  2 siblings, 1 reply; 209+ messages in thread
From: Eric Blake @ 2015-06-16 15:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]

On 06/16/2015 04:26 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The 'postcopy ram' capability allows postcopy migration of RAM;
> note that the migration starts off in precopy mode until
> postcopy mode is triggered (see the migrate_start_postcopy
> patch later in the series).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  1 +
>  migration/migration.c         | 23 +++++++++++++++++++++++
>  qapi-schema.json              |  6 +++++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
> 

> +++ b/qapi-schema.json
> @@ -526,11 +526,15 @@
>  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
>  #          to speed up convergence of RAM migration. (since 1.6)
>  #
> +# @x-postcopy-ram: Start executing on the migration target before all of RAM has
> +#          been migrated, pulling the remaining pages along as needed. NOTE: If
> +#          the migration fails during postcopy the VM will fail.  (since 2.4)

Marking it experimental because it might change?  Or is the interface
pretty stable, but you want more testing time to minimize bugs?

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram.
  2015-06-16 15:43   ` Eric Blake
@ 2015-06-16 15:58     ` Dr. David Alan Gilbert
  2015-07-15  9:39       ` Amit Shah
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-16 15:58 UTC (permalink / raw)
  To: Eric Blake
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	amit.shah, pbonzini, david

* Eric Blake (eblake@redhat.com) wrote:
> On 06/16/2015 04:26 AM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The 'postcopy ram' capability allows postcopy migration of RAM;
> > note that the migration starts off in precopy mode until
> > postcopy mode is triggered (see the migrate_start_postcopy
> > patch later in the series).
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |  1 +
> >  migration/migration.c         | 23 +++++++++++++++++++++++
> >  qapi-schema.json              |  6 +++++-
> >  3 files changed, 29 insertions(+), 1 deletion(-)
> > 
> 
> > +++ b/qapi-schema.json
> > @@ -526,11 +526,15 @@
> >  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
> >  #          to speed up convergence of RAM migration. (since 1.6)
> >  #
> > +# @x-postcopy-ram: Start executing on the migration target before all of RAM has
> > +#          been migrated, pulling the remaining pages along as needed. NOTE: If
> > +#          the migration fails during postcopy the VM will fail.  (since 2.4)
> 
> Marking it experimental because it might change?  Or is the interface
> pretty stable, but you want more testing time to minimize bugs?

It's easy enough to remove the x-  once we're all happy;  it seems pretty
stable at the moment but when we're done I'll just submit a one liner to take the x-
off.

Dave

> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works Dr. David Alan Gilbert (git)
@ 2015-06-17 11:42   ` Juan Quintela
  2015-06-17 12:30     ` Dr. David Alan Gilbert
  2015-06-18  7:50   ` Li, Liang Z
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 11:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  docs/migration.txt | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 167 insertions(+)
>
> diff --git a/docs/migration.txt b/docs/migration.txt
> index f6df4be..b4b93d1 100644
> --- a/docs/migration.txt
> +++ b/docs/migration.txt
> @@ -291,3 +291,170 @@ save/send this state when we are in the middle
> of a pio operation
>  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
>  not enabled, the values on that fields are garbage and don't need to
>  be sent.
> +
> += Return path =
> +
> +In most migration scenarios there is only a single data path that runs
> +from the source VM to the destination, typically along a single fd (although
> +possibly with another fd or similar for some fast way of throwing pages across).
> +
> +However, some uses need two way communication; in particular the
> Postcopy destination

This line is a bit long O:-)

In general, we are too near to the 80 columns limit.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 02/42] Provide runtime Target page information
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 02/42] Provide runtime Target page information Dr. David Alan Gilbert (git)
@ 2015-06-17 11:43   ` Juan Quintela
  0 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 11:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The migration code generally is built target-independent, however
> there are a few places where knowing the target page size would
> avoid artificially moving stuff into migration/ram.c.
>
> Provide 'qemu_target_page_bits()' that returns TARGET_PAGE_BITS
> to other bits of code so that they can stay target-independent.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest Dr. David Alan Gilbert (git)
@ 2015-06-17 11:49   ` Juan Quintela
  2015-07-06  6:14   ` Amit Shah
  2015-08-04  5:23   ` Amit Shah
  2 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 11:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> One of my patches used a loop that was based on host page size;
> it dies in qtest since qtest hadn't bothered init'ing it.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 04/42] qemu_ram_block_from_host
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 04/42] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
@ 2015-06-17 11:54   ` Juan Quintela
  2015-07-10  8:36   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 11:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Postcopy sends RAMBlock names and offsets over the wire (since it can't
> rely on the order of ramaddr being the same), and it starts out with
> HVA fault addresses from the kernel.
>
> qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
> in the RAMBlock and the global ram_addr_t value.
>
> Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.
>
> Provide qemu_ram_get_idstr since its the actual name text sent on the
> wire.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
@ 2015-06-17 11:57   ` Juan Quintela
  2015-06-17 12:33     ` Dr. David Alan Gilbert
  2015-07-13  9:08   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 11:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> qemu_get_buffer always copies the data it reads to a users buffer,
> however in many cases the file buffer inside qemu_file could be given
> back to the caller, avoiding the copy.  This isn't always possible
> depending on the size and alignment of the data.
>
> Thus 'qemu_get_buffer_less_copy' either copies the data to a supplied
> buffer or updates a pointer to the internal buffer if convenient.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I don't know still where this function is used, but function is correct.

Can I suggest to change th ename to:

qemu_get_buffer_in_place()?

less_copy sounds ambigous to me :p

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/42] Add wrapper for setting blocking status on a QEMUFile
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 06/42] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
@ 2015-06-17 11:59   ` Juan Quintela
  2015-06-17 12:34     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 11:59 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Add a wrapper to change the blocking status on a QEMUFile
> rather than having to use qemu_set_block(qemu_get_fd(f));
> it seems best to avoid exposing the fd since not all QEMUFile's
> really have one.  With this wrapper we could move the implementation
> down to be different on different transports.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Can we improve naming?

> ---
>  include/migration/qemu-file.h |  1 +
>  migration/qemu-file.c         | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
>
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index 29a9d69..d43c835 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -193,6 +193,7 @@ int qemu_file_get_error(QEMUFile *f);
>  void qemu_file_set_error(QEMUFile *f, int ret);
>  int qemu_file_shutdown(QEMUFile *f);
>  void qemu_fflush(QEMUFile *f);
> +void qemu_file_change_blocking(QEMUFile *f, bool block);
>  
>  static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
>  {
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index c111a6b..c746129 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -651,3 +651,18 @@ size_t qemu_get_counted_string(QEMUFile *f, char buf[256])
>  
>      return res == len ? res : 0;
>  }
> +
> +/*
> + * Change the blocking state of the QEMUFile.
> + * Note: On some transports the OS only keeps a single blocking state for
> + *       both directions, and thus changing the blocking on the main
> + *       QEMUFile can also affect the return path.
> + */
> +void qemu_file_change_blocking(QEMUFile *f, bool block)

qemu_file_set_blocking?

It don't change the blocking, it just do whatever block says?

> +{
> +    if (block) {
> +        qemu_set_block(qemu_get_fd(f));
> +    } else {
> +        qemu_set_nonblock(qemu_get_fd(f));
> +    }
> +}

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
@ 2015-06-17 12:17   ` Juan Quintela
  2015-06-19 17:04     ` Dr. David Alan Gilbert
  2015-07-13  9:12   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 12:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Useful for debugging the migration bitmap and other bitmaps
> of the same format (including the sentmap in postcopy).
>
> The bitmap is printed to stderr.
> Lines that are all the expected value are excluded so the output
> can be quite compact for many bitmaps.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  1 +
>  migration/ram.c               | 38 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 9387c8c..b3a7f75 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -144,6 +144,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
>  double xbzrle_mig_cache_miss_rate(void);
>  
>  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> +void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
>  
>  /**
>   * @migrate_add_blocker - prevent migration from proceeding
> diff --git a/migration/ram.c b/migration/ram.c
> index 57368e1..efc215a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1051,6 +1051,44 @@ static void reset_ram_globals(void)
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
>  
> +/*
> + * 'expected' is the value you expect the bitmap mostly to be full
> + * of; it won't bother printing lines that are all this value.
> + * If 'todump' is null the migration bitmap is dumped.
> + */
> +void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
> +{
> +    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> +
> +    int64_t cur;
> +    int64_t linelen = 128;
> +    char linebuf[129];
> +
> +    if (!todump) {
> +        todump = migration_bitmap;
> +    }

Why?  Just alssert that todump!= NULL?


> +
> +    for (cur = 0; cur < ram_pages; cur += linelen) {
> +        int64_t curb;
> +        bool found = false;
> +        /*
> +         * Last line; catch the case where the line length
> +         * is longer than remaining ram
> +         */
> +        if (cur + linelen > ram_pages) {
> +            linelen = ram_pages - cur;
> +        }
> +        for (curb = 0; curb < linelen; curb++) {
> +            bool thisbit = test_bit(cur + curb, todump);
> +            linebuf[curb] = thisbit ? '1' : '.';

Put 1 and 0?  Why the dot?

> +            found = found || (thisbit != expected);
> +        }
> +        if (found) {
> +            linebuf[curb] = '\0';
> +            fprintf(stderr,  "0x%08" PRIx64 " : %s\n", cur, linebuf);
> +        }
> +    }
> +}


And once here, why are we doing it this way?  We have

find_first_bit(addr, nbits) and find_first_zero_bit(addr, nbits) and
friends?

Doiwg the walk by hand looks weird, no?

Later, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/42] migrate_init: Call from savevm
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 08/42] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
@ 2015-06-17 12:18   ` Juan Quintela
  2015-07-13  9:13   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 12:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Suspend to file is very much like a migrate, and it makes life
> easier if we have the Migration state available, so initialise it
> in the savevm.c code for suspending.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/42] Rename save_live_complete to save_live_complete_precopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 09/42] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
@ 2015-06-17 12:20   ` Juan Quintela
  0 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 12:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> In postcopy we're going to need to perform the complete phase
> for postcopiable devices at a different point, start out by
> renaming all of the 'complete's to make the difference obvious.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I will have called it save_live_precopy() and just drop the complete to
make names shorter, but as I am not doing the coding ....

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
@ 2015-06-17 12:23   ` Juan Quintela
  2015-06-17 17:07     ` Dr. David Alan Gilbert
  2015-07-13 10:12   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 12:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Postcopy needs a method to send messages from the destination back to
> the source, this is the 'return path'.
>
> Wire it up for 'socket' QEMUFile's.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/qemu-file.h |  7 +++++
>  migration/qemu-file-unix.c    | 69 +++++++++++++++++++++++++++++++++++++------
>  migration/qemu-file.c         | 12 ++++++++
>  3 files changed, 79 insertions(+), 9 deletions(-)
>
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index d43c835..7721c42 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -85,6 +85,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,

Hi

> +/*
> + * Give a QEMUFile* off the same socket but data in the opposite
> + * direction.
> + */
> +static QEMUFile *socket_dup_return_path(void *opaque)

We call it dup

> +{
> +    QEMUFileSocket *forward = opaque;
> +    QEMUFileSocket *reverse;
> +
> +    if (qemu_file_get_error(forward->file)) {
> +        /* If the forward file is in error, don't try and open a return */
> +        return NULL;
> +    }
> +
> +    reverse = g_malloc0(sizeof(QEMUFileSocket));
> +    reverse->fd = forward->fd;

But we don't dup it :p

For the cest, I am ok with the patch.

Reviewed-by: Juan Quintela <quintela@redhat.com>


> +    /* I don't think there's a better way to tell which direction 'this' is */
> +    if (forward->file->ops->get_buffer != NULL) {
> +        /* being called from the read side, so we need to be able to write */
> +        return qemu_fopen_ops(reverse, &socket_return_write_ops);
> +    } else {
> +        return qemu_fopen_ops(reverse, &socket_return_read_ops);
> +    }
> +}
> +
>  static ssize_t unix_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
>                                    int64_t pos)
>  {
> @@ -204,18 +254,19 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
>  }
>  
>  static const QEMUFileOps socket_read_ops = {
> -    .get_fd     = socket_get_fd,
> -    .get_buffer = socket_get_buffer,
> -    .close      = socket_close,
> -    .shut_down  = socket_shutdown
> -
> +    .get_fd          = socket_get_fd,
> +    .get_buffer      = socket_get_buffer,
> +    .close           = socket_close,
> +    .shut_down       = socket_shutdown,
> +    .get_return_path = socket_dup_return_path
>  };
>  
>  static const QEMUFileOps socket_write_ops = {
> -    .get_fd        = socket_get_fd,
> -    .writev_buffer = socket_writev_buffer,
> -    .close         = socket_close,
> -    .shut_down     = socket_shutdown
> +    .get_fd          = socket_get_fd,
> +    .writev_buffer   = socket_writev_buffer,
> +    .close           = socket_close,
> +    .shut_down       = socket_shutdown,
> +    .get_return_path = socket_dup_return_path
>  };
>  
>  QEMUFile *qemu_fopen_socket(int fd, const char *mode)
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index c746129..7d9d983 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -43,6 +43,18 @@ int qemu_file_shutdown(QEMUFile *f)
>      return f->ops->shut_down(f->opaque, true, true);
>  }
>  
> +/*
> + * Result: QEMUFile* for a 'return path' for comms in the opposite direction
> + *         NULL if not available
> + */
> +QEMUFile *qemu_file_get_return_path(QEMUFile *f)
> +{
> +    if (!f->ops->get_return_path) {
> +        return NULL;
> +    }
> +    return f->ops->get_return_path(f->opaque);
> +}
> +
>  bool qemu_file_mode_is_not_valid(const char *mode)
>  {
>      if (mode == NULL ||

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
@ 2015-06-17 12:28   ` Juan Quintela
  2015-06-19 17:18     ` Dr. David Alan Gilbert
  2015-07-13 12:37   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 12:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The destination sets the fd to non-blocking on incoming migrations;
> this also affects the return path from the destination, and thus we
> need to make sure we can safely write to the return path.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>


>  migration/qemu-file-unix.c | 41 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)
>
> diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
> index 561621e..b6c55ab 100644
> --- a/migration/qemu-file-unix.c
> +++ b/migration/qemu-file-unix.c
> @@ -39,12 +39,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
>      QEMUFileSocket *s = opaque;
>      ssize_t len;
>      ssize_t size = iov_size(iov, iovcnt);
> +    ssize_t offset = 0;
> +    int     err;
>  
> -    len = iov_send(s->fd, iov, iovcnt, 0, size);
> -    if (len < size) {
> -        len = -socket_error();
> -    }
> -    return len;
> +    while (size > 0) {
> +        len = iov_send(s->fd, iov, iovcnt, offset, size);
> +
> +        if (len > 0) {
> +            size -= len;
> +            offset += len;
> +        }

ion_send() can return -1 on error

This looks a "hacky way" to look for it, althoght my understanding is
that it is correct, so the review-by


> +
> +        if (size > 0) {
> +            err = socket_error();
> +
> +            if (err != EAGAIN && err != EWOULDBLOCK) {
> +                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
> +                             err, size, len);
> +                /*
> +                 * If I've already sent some but only just got the error, I
> +                 * could return the amount validly sent so far and wait for the
> +                 * next call to report the error, but I'd rather flag the error
> +                 * immediately.
> +                 */
> +                return -err;
> +            }
> +
> +            /* Emulate blocking */
> +            GPollFD pfd;
> +
> +            pfd.fd = s->fd;
> +            pfd.events = G_IO_OUT | G_IO_ERR;
> +            pfd.revents = 0;
> +            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
> +        }
> +     }
> +
> +    return offset;
>  }
>  
>  static int socket_get_fd(void *opaque)

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-17 11:42   ` Juan Quintela
@ 2015-06-17 12:30     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-17 12:30 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  docs/migration.txt | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 167 insertions(+)
> >
> > diff --git a/docs/migration.txt b/docs/migration.txt
> > index f6df4be..b4b93d1 100644
> > --- a/docs/migration.txt
> > +++ b/docs/migration.txt
> > @@ -291,3 +291,170 @@ save/send this state when we are in the middle
> > of a pio operation
> >  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
> >  not enabled, the values on that fields are garbage and don't need to
> >  be sent.
> > +
> > += Return path =
> > +
> > +In most migration scenarios there is only a single data path that runs
> > +from the source VM to the destination, typically along a single fd (although
> > +possibly with another fd or similar for some fast way of throwing pages across).
> > +
> > +However, some uses need two way communication; in particular the
> > Postcopy destination
> 
> This line is a bit long O:-)
> 
> In general, we are too near to the 80 columns limit.

Thanks, Fixed (interesting, the check-patch doesn't
seem to moan about text files).

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/42] Migration commands
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 12/42] Migration commands Dr. David Alan Gilbert (git)
@ 2015-06-17 12:31   ` Juan Quintela
  2015-06-19 17:38     ` Dr. David Alan Gilbert
  2015-07-13 12:45   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 12:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Create QEMU_VM_COMMAND section type for sending commands from
> source to destination.  These commands are not intended to convey
> guest state but to control the migration process.
>
> For use in postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  1 +
>  include/sysemu/sysemu.h       |  7 +++++++
>  migration/savevm.c            | 46 +++++++++++++++++++++++++++++++++++++++++++
>  trace-events                  |  1 +
>  4 files changed, 55 insertions(+)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 414c5cf..8adaa45 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -34,6 +34,7 @@
>  #define QEMU_VM_SECTION_FULL         0x04
>  #define QEMU_VM_SUBSECTION           0x05
>  #define QEMU_VM_VMDESCRIPTION        0x06
> +#define QEMU_VM_COMMAND              0x07

conflicts with configuration section just sent

>  #define QEMU_VM_SECTION_FOOTER       0x7e
>  
>  struct MigrationParams {
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 6dae2db..5869607 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -82,6 +82,11 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
>  
>  void qemu_announce_self(void);
>  
> +/* Subcommands for QEMU_VM_COMMAND */
> +enum qemu_vm_cmd {
> +    MIG_CMD_INVALID = 0,   /* Must be 0 */
> +};
> +
>  bool qemu_savevm_state_blocked(Error **errp);
>  void qemu_savevm_state_begin(QEMUFile *f,
>                               const MigrationParams *params);
> @@ -90,6 +95,8 @@ int qemu_savevm_state_iterate(QEMUFile *f);
>  void qemu_savevm_state_complete_precopy(QEMUFile *f);
>  void qemu_savevm_state_cancel(void);
>  uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
> +void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
> +                              uint16_t len, uint8_t *data);
>  int qemu_loadvm_state(QEMUFile *f);
>  
>  typedef enum DisplayType
> diff --git a/migration/savevm.c b/migration/savevm.c
> index f9168ac..7ce9d21 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -686,6 +686,23 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
>      return true;
>  }
>  
> +/* Send a 'QEMU_VM_COMMAND' type element with the command
> + * and associated data.

Please, use doxygen comments for new functions :p


> + */
> +void qemu_savevm_command_send(QEMUFile *f,
> +                              enum qemu_vm_cmd command,
> +                              uint16_t len,
> +                              uint8_t *data)
> +{
> +    qemu_put_byte(f, QEMU_VM_COMMAND);
> +    qemu_put_be16(f, (uint16_t)command);
> +    qemu_put_be16(f, len);
> +    if (len) {
> +        qemu_put_buffer(f, data, len);
> +    }
> +    qemu_fflush(f);
> +}
> +
>  bool qemu_savevm_state_blocked(Error **errp)
>  {
>      SaveStateEntry *se;
> @@ -982,6 +999,29 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
>      return NULL;
>  }
>  
> +/*
> + * Process an incoming 'QEMU_VM_COMMAND'
> + * negative return on error (will issue error message)
> + */
> +static int loadvm_process_command(QEMUFile *f)
> +{
> +    uint16_t cmd;
> +    uint16_t len;
> +
> +    cmd = qemu_get_be16(f);
> +    len = qemu_get_be16(f);
> +
> +    trace_loadvm_process_command(cmd, len);

trace for load but not for sending?

> +    switch (cmd) {
> +
> +    default:
> +        error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  struct LoadStateEntry {
>      QLIST_ENTRY(LoadStateEntry) entry;
>      SaveStateEntry *se;
> @@ -1114,6 +1154,12 @@ int qemu_loadvm_state(QEMUFile *f)
>                  goto out;
>              }
>              break;
> +        case QEMU_VM_COMMAND:
> +            ret = loadvm_process_command(f);
> +            if (ret < 0) {
> +                goto out;
> +            }
> +            break;
>          default:
>              error_report("Unknown savevm section type %d", section_type);
>              ret = -EINVAL;
> diff --git a/trace-events b/trace-events
> index d539528..73a65c3 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1183,6 +1183,7 @@ virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64
>  qemu_loadvm_state_section(unsigned int section_type) "%d"
>  qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
>  qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
> +loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
>  savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
>  savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
>  savevm_state_begin(void) ""

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time
  2015-06-17 11:57   ` Juan Quintela
@ 2015-06-17 12:33     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-17 12:33 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > qemu_get_buffer always copies the data it reads to a users buffer,
> > however in many cases the file buffer inside qemu_file could be given
> > back to the caller, avoiding the copy.  This isn't always possible
> > depending on the size and alignment of the data.
> >
> > Thus 'qemu_get_buffer_less_copy' either copies the data to a supplied
> > buffer or updates a pointer to the internal buffer if convenient.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> I don't know still where this function is used, but function is correct.
> 
> Can I suggest to change th ename to:
> 
> qemu_get_buffer_in_place()?
> 
> less_copy sounds ambigous to me :p

Changed.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/42] Add wrapper for setting blocking status on a QEMUFile
  2015-06-17 11:59   ` Juan Quintela
@ 2015-06-17 12:34     ` Dr. David Alan Gilbert
  2015-06-17 12:57       ` Juan Quintela
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-17 12:34 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Add a wrapper to change the blocking status on a QEMUFile
> > rather than having to use qemu_set_block(qemu_get_fd(f));
> > it seems best to avoid exposing the fd since not all QEMUFile's
> > really have one.  With this wrapper we could move the implementation
> > down to be different on different transports.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> Can we improve naming?
> 
> > ---
> >  include/migration/qemu-file.h |  1 +
> >  migration/qemu-file.c         | 15 +++++++++++++++
> >  2 files changed, 16 insertions(+)
> >
> > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> > index 29a9d69..d43c835 100644
> > --- a/include/migration/qemu-file.h
> > +++ b/include/migration/qemu-file.h
> > @@ -193,6 +193,7 @@ int qemu_file_get_error(QEMUFile *f);
> >  void qemu_file_set_error(QEMUFile *f, int ret);
> >  int qemu_file_shutdown(QEMUFile *f);
> >  void qemu_fflush(QEMUFile *f);
> > +void qemu_file_change_blocking(QEMUFile *f, bool block);
> >  
> >  static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
> >  {
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index c111a6b..c746129 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -651,3 +651,18 @@ size_t qemu_get_counted_string(QEMUFile *f, char buf[256])
> >  
> >      return res == len ? res : 0;
> >  }
> > +
> > +/*
> > + * Change the blocking state of the QEMUFile.
> > + * Note: On some transports the OS only keeps a single blocking state for
> > + *       both directions, and thus changing the blocking on the main
> > + *       QEMUFile can also affect the return path.
> > + */
> > +void qemu_file_change_blocking(QEMUFile *f, bool block)
> 
> qemu_file_set_blocking?
> 
> It don't change the blocking, it just do whatever block says?
> 
> > +{
> > +    if (block) {
> > +        qemu_set_block(qemu_get_fd(f));
> > +    } else {
> > +        qemu_set_nonblock(qemu_get_fd(f));
> > +    }
> > +}

I worry about having a:
   qemu_file_set_blocking
and a 
   qemu_set_block

it sounds a bit similar when one always 'sets' (i.e. turns on)
and the other either turns on or off.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/42] Return path: Control commands
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 13/42] Return path: Control commands Dr. David Alan Gilbert (git)
@ 2015-06-17 12:49   ` Juan Quintela
  2015-06-23 18:57     ` Dr. David Alan Gilbert
  2015-07-13 12:55   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 12:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Add two src->dest commands:
>    * OPEN_RETURN_PATH - To request that the destination open the return path
>    * PING - Request an acknowledge from the destination
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/migration/migration.h |  2 ++
>  include/sysemu/sysemu.h       |  6 ++++-
>  migration/savevm.c            | 60 +++++++++++++++++++++++++++++++++++++++++++
>  trace-events                  |  2 ++
>  4 files changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 8adaa45..65fe5db 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -47,6 +47,8 @@ typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
>  struct MigrationIncomingState {
>      QEMUFile *file;
>  
> +    QEMUFile *return_path;
> +
>      /* See savevm.c */
>      LoadStateEntry_Head loadvm_handlers;
>  };
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 5869607..d8875ca 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -84,7 +84,9 @@ void qemu_announce_self(void);
>  
>  /* Subcommands for QEMU_VM_COMMAND */
>  enum qemu_vm_cmd {
> -    MIG_CMD_INVALID = 0,   /* Must be 0 */
> +    MIG_CMD_INVALID = 0,       /* Must be 0 */
> +    MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
> +    MIG_CMD_PING,              /* Request a PONG on the RP */

Add    MIG_CMD_MAX
    

struct cmd_args {
       int len;
       char *name;
} cmd_args[] ={
.[MIG_CMD_INVALID] = {
   .len = 0,
   .name = "CMD_INVALID"},
.[MIG_CMD_OPEN_RETURN_PATH] = {
   .len = 0,
   .name = "CMD_OPEN_RETURN_PATH"},
.....
}
}

I have done the initialization by hand, not sure if syntax is ok.


>  static int loadvm_process_command(QEMUFile *f)
>  {
> +    MigrationIncomingState *mis = migration_incoming_get_current();
>      uint16_t cmd;
>      uint16_t len;
> +    uint32_t tmp32;
>  
>      cmd = qemu_get_be16(f);
>      len = qemu_get_be16(f);
>  
>      trace_loadvm_process_command(cmd, len);

/* yes, this should go in previous patch */

if (cmd > MIG_CMD_MAX ) {
            error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
     .....
}

if (loadvm_process_command_simple_lencheck(cmd_args[cmd].name, len,
cmd_args[cmd].len) {
                   return -1;
}

switch (cmd) {
case MIG_CMD_OPEN_RETURN_PATH:
        if (mis->return_path) {
            error_report("CMD_OPEN_RETURN_PATH called when RP already open");
            /* Not really a problem, so don't give up */
            return 0;
        }
        mis->return_path = qemu_file_get_return_path(f);
        if (!mis->return_path) {
            error_report("CMD_OPEN_RETURN_PATH failed");
            return -1;
        }
        break;

.....
}

You get the idea.  I normally even put the code from the command in a
function pointer, but I know that other people don't like it.
This way you factor the common code at the beginning of the function.

What do you think?


>      switch (cmd) {
> +    case MIG_CMD_OPEN_RETURN_PATH:
> +        if (loadvm_process_command_simple_lencheck("CMD_OPEN_RETURN_PATH",
> +                                                   len, 0)) {
> +            return -1;
> +        }
> +        if (mis->return_path) {
> +            error_report("CMD_OPEN_RETURN_PATH called when RP already open");
> +            /* Not really a problem, so don't give up */
> +            return 0;
> +        }
> +        mis->return_path = qemu_file_get_return_path(f);
> +        if (!mis->return_path) {
> +            error_report("CMD_OPEN_RETURN_PATH failed");
> +            return -1;
> +        }
> +        break;
> +
> +    case MIG_CMD_PING:
> +        if (loadvm_process_command_simple_lencheck("CMD_PING", len,
> +                                                   sizeof(tmp32))) {
> +            return -1;
> +        }
> +        tmp32 = qemu_get_be32(f);
> +        trace_loadvm_process_command_ping(tmp32);
> +        if (!mis->return_path) {
> +            error_report("CMD_PING (0x%x) received with no return path",
> +                         tmp32);
> +            return -1;
> +        }
> +        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
> +        break;
>  
>      default:
>          error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
> diff --git a/trace-events b/trace-events
> index 73a65c3..5967fdf 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1184,8 +1184,10 @@ qemu_loadvm_state_section(unsigned int section_type) "%d"
>  qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
>  qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
>  loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
> +loadvm_process_command_ping(uint32_t val) "%x"
>  savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
>  savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
> +savevm_send_ping(uint32_t val) "%x"
>  savevm_state_begin(void) ""
>  savevm_state_header(void) ""
>  savevm_state_iterate(void) ""

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/42] Add wrapper for setting blocking status on a QEMUFile
  2015-06-17 12:34     ` Dr. David Alan Gilbert
@ 2015-06-17 12:57       ` Juan Quintela
  0 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 12:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >
>> > Add a wrapper to change the blocking status on a QEMUFile
>> > rather than having to use qemu_set_block(qemu_get_fd(f));
>> > it seems best to avoid exposing the fd since not all QEMUFile's
>> > really have one.  With this wrapper we could move the implementation
>> > down to be different on different transports.
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> > Reviewed-by: Amit Shah <amit.shah@redhat.com>
>> 
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> 
>> Can we improve naming?
>> 
>> > ---
>> >  include/migration/qemu-file.h |  1 +
>> >  migration/qemu-file.c         | 15 +++++++++++++++
>> >  2 files changed, 16 insertions(+)
>> >
>> > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
>> > index 29a9d69..d43c835 100644
>> > --- a/include/migration/qemu-file.h
>> > +++ b/include/migration/qemu-file.h
>> > @@ -193,6 +193,7 @@ int qemu_file_get_error(QEMUFile *f);
>> >  void qemu_file_set_error(QEMUFile *f, int ret);
>> >  int qemu_file_shutdown(QEMUFile *f);
>> >  void qemu_fflush(QEMUFile *f);
>> > +void qemu_file_change_blocking(QEMUFile *f, bool block);
>> >  
>> >  static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
>> >  {
>> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>> > index c111a6b..c746129 100644
>> > --- a/migration/qemu-file.c
>> > +++ b/migration/qemu-file.c
>> > @@ -651,3 +651,18 @@ size_t qemu_get_counted_string(QEMUFile *f, char buf[256])
>> >  
>> >      return res == len ? res : 0;
>> >  }
>> > +
>> > +/*
>> > + * Change the blocking state of the QEMUFile.
>> > + * Note: On some transports the OS only keeps a single blocking state for
>> > + *       both directions, and thus changing the blocking on the main
>> > + *       QEMUFile can also affect the return path.
>> > + */
>> > +void qemu_file_change_blocking(QEMUFile *f, bool block)
>> 
>> qemu_file_set_blocking?
>> 
>> It don't change the blocking, it just do whatever block says?
>> 
>> > +{
>> > +    if (block) {
>> > +        qemu_set_block(qemu_get_fd(f));
>> > +    } else {
>> > +        qemu_set_nonblock(qemu_get_fd(f));
>> > +    }
>> > +}
>
> I worry about having a:
>    qemu_file_set_blocking
> and a 
>    qemu_set_block
>
> it sounds a bit similar when one always 'sets' (i.e. turns on)
> and the other either turns on or off.

There is a parameter difference, but I am not writting the code, and
don't care so much.  I would expect a function with change in its name
to change to the other plocking, whatever that is :P

Later, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
@ 2015-06-17 16:30   ` Juan Quintela
  2015-06-19 18:42     ` Dr. David Alan Gilbert
  2015-07-15  7:31   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-06-17 16:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Add migrate_send_rp_message to send a message from destination to source along the return path.
>   (It uses a mutex to let it be called from multiple threads)
> Add migrate_send_rp_shut to send a 'shut' message to indicate
>   the destination is finished with the RP.
> Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
>   Use it in the MSG_RP_PING handler
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h | 17 ++++++++++++++++
>  migration/migration.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
>  migration/savevm.c            |  2 +-
>  trace-events                  |  1 +
>  4 files changed, 64 insertions(+), 1 deletion(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 65fe5db..36caab9 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -42,12 +42,20 @@ struct MigrationParams {
>      bool shared;
>  };
>  
> +/* Messages sent on the return path from destination to source */
> +enum mig_rp_message_type {
> +    MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
> +    MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
> +    MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
> +};
> +
>  typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
>  /* State for the incoming migration */
>  struct MigrationIncomingState {
>      QEMUFile *file;
>  
>      QEMUFile *return_path;
> +    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
>  
>      /* See savevm.c */
>      LoadStateEntry_Head loadvm_handlers;
> @@ -179,6 +187,15 @@ int migrate_compress_level(void);
>  int migrate_compress_threads(void);
>  int migrate_decompress_threads(void);
>  
> +/* Sending on the return path - generic and then for each message type */
> +void migrate_send_rp_message(MigrationIncomingState *mis,
> +                             enum mig_rp_message_type message_type,
> +                             uint16_t len, void *data);
> +void migrate_send_rp_shut(MigrationIncomingState *mis,
> +                          uint32_t value);
> +void migrate_send_rp_pong(MigrationIncomingState *mis,
> +                          uint32_t value);
> +
>  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
>  void ram_control_load_hook(QEMUFile *f, uint64_t flags);
> diff --git a/migration/migration.c b/migration/migration.c
> index 295f15a..afb19a1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -85,6 +85,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
>      mis_current = g_malloc0(sizeof(MigrationIncomingState));
>      mis_current->file = f;
>      QLIST_INIT(&mis_current->loadvm_handlers);
> +    qemu_mutex_init(&mis_current->rp_mutex);
>  
>      return mis_current;
>  }
> @@ -182,6 +183,50 @@ void process_incoming_migration(QEMUFile *f)
>      qemu_coroutine_enter(co, f);
>  }
>  
> +/*
> + * Send a message on the return channel back to the source
> + * of the migration.
> + */
> +void migrate_send_rp_message(MigrationIncomingState *mis,
> +                             enum mig_rp_message_type message_type,
> +                             uint16_t len, void *data)
> +{
> +    trace_migrate_send_rp_message((int)message_type, len);
> +    qemu_mutex_lock(&mis->rp_mutex);
> +    qemu_put_be16(mis->return_path, (unsigned int)message_type);
> +    qemu_put_be16(mis->return_path, len);
if (len) {

> +    qemu_put_buffer(mis->return_path, data, len);
}


?

We check for zero sized command on control commands but not on
responses?

> +    qemu_fflush(mis->return_path);
> +    qemu_mutex_unlock(&mis->rp_mutex);
> +}
> +
> +/*
> + * Send a 'SHUT' message on the return channel with the given value
> + * to indicate that we've finished with the RP.  Non-0 value indicates
> + * error.
> + */
> +void migrate_send_rp_shut(MigrationIncomingState *mis,
> +                          uint32_t value)
> +{
> +    uint32_t buf;
> +
> +    buf = cpu_to_be32(value);
> +    migrate_send_rp_message(mis, MIG_RP_MSG_SHUT, sizeof(buf), &buf);
> +}
> +
> +/*
> + * Send a 'PONG' message on the return channel with the given value
> + * (normally in response to a 'PING')
> + */
> +void migrate_send_rp_pong(MigrationIncomingState *mis,
> +                          uint32_t value)
> +{
> +    uint32_t buf;
> +
> +    buf = cpu_to_be32(value);
> +    migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
> +}
> +
>  /* amount of nanoseconds we are willing to wait for migration to be down.
>   * the choice of nanoseconds is because it is the maximum resolution that
>   * get_clock() can achieve. It is an internal measure. All user-visible
> diff --git a/migration/savevm.c b/migration/savevm.c
> index a995014..d424c2a 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1071,7 +1071,7 @@ static int loadvm_process_command(QEMUFile *f)
>                           tmp32);
>              return -1;
>          }
> -        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
> +        migrate_send_rp_pong(mis, tmp32);
>          break;
>  
>      default:
> diff --git a/trace-events b/trace-events
> index 5967fdf..5738e3f 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1399,6 +1399,7 @@ migrate_fd_cleanup(void) ""
>  migrate_fd_error(void) ""
>  migrate_fd_cancel(void) ""
>  migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
> +migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
>  migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
>  
>  # migration/rdma.c

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets
  2015-06-17 12:23   ` Juan Quintela
@ 2015-06-17 17:07     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-17 17:07 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Postcopy needs a method to send messages from the destination back to
> > the source, this is the 'return path'.
> >
> > Wire it up for 'socket' QEMUFile's.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/qemu-file.h |  7 +++++
> >  migration/qemu-file-unix.c    | 69 +++++++++++++++++++++++++++++++++++++------
> >  migration/qemu-file.c         | 12 ++++++++
> >  3 files changed, 79 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> > index d43c835..7721c42 100644
> > --- a/include/migration/qemu-file.h
> > +++ b/include/migration/qemu-file.h
> > @@ -85,6 +85,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
> 
> Hi
> 
> > +/*
> > + * Give a QEMUFile* off the same socket but data in the opposite
> > + * direction.
> > + */
> > +static QEMUFile *socket_dup_return_path(void *opaque)
> 
> We call it dup
> 
> > +{
> > +    QEMUFileSocket *forward = opaque;
> > +    QEMUFileSocket *reverse;
> > +
> > +    if (qemu_file_get_error(forward->file)) {
> > +        /* If the forward file is in error, don't try and open a return */
> > +        return NULL;
> > +    }
> > +
> > +    reverse = g_malloc0(sizeof(QEMUFileSocket));
> > +    reverse->fd = forward->fd;
> 
> But we don't dup it :p

Oh yeh, we used to :-)  I've replaced _dup with _get

> For the cest, I am ok with the patch.
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Thanks.

Dave

> 
> 
> > +    /* I don't think there's a better way to tell which direction 'this' is */
> > +    if (forward->file->ops->get_buffer != NULL) {
> > +        /* being called from the read side, so we need to be able to write */
> > +        return qemu_fopen_ops(reverse, &socket_return_write_ops);
> > +    } else {
> > +        return qemu_fopen_ops(reverse, &socket_return_read_ops);
> > +    }
> > +}
> > +
> >  static ssize_t unix_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
> >                                    int64_t pos)
> >  {
> > @@ -204,18 +254,19 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
> >  }
> >  
> >  static const QEMUFileOps socket_read_ops = {
> > -    .get_fd     = socket_get_fd,
> > -    .get_buffer = socket_get_buffer,
> > -    .close      = socket_close,
> > -    .shut_down  = socket_shutdown
> > -
> > +    .get_fd          = socket_get_fd,
> > +    .get_buffer      = socket_get_buffer,
> > +    .close           = socket_close,
> > +    .shut_down       = socket_shutdown,
> > +    .get_return_path = socket_dup_return_path
> >  };
> >  
> >  static const QEMUFileOps socket_write_ops = {
> > -    .get_fd        = socket_get_fd,
> > -    .writev_buffer = socket_writev_buffer,
> > -    .close         = socket_close,
> > -    .shut_down     = socket_shutdown
> > +    .get_fd          = socket_get_fd,
> > +    .writev_buffer   = socket_writev_buffer,
> > +    .close           = socket_close,
> > +    .shut_down       = socket_shutdown,
> > +    .get_return_path = socket_dup_return_path
> >  };
> >  
> >  QEMUFile *qemu_fopen_socket(int fd, const char *mode)
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index c746129..7d9d983 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -43,6 +43,18 @@ int qemu_file_shutdown(QEMUFile *f)
> >      return f->ops->shut_down(f->opaque, true, true);
> >  }
> >  
> > +/*
> > + * Result: QEMUFile* for a 'return path' for comms in the opposite direction
> > + *         NULL if not available
> > + */
> > +QEMUFile *qemu_file_get_return_path(QEMUFile *f)
> > +{
> > +    if (!f->ops->get_return_path) {
> > +        return NULL;
> > +    }
> > +    return f->ops->get_return_path(f->opaque);
> > +}
> > +
> >  bool qemu_file_mode_is_not_valid(const char *mode)
> >  {
> >      if (mode == NULL ||
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works Dr. David Alan Gilbert (git)
  2015-06-17 11:42   ` Juan Quintela
@ 2015-06-18  7:50   ` Li, Liang Z
  2015-06-18  8:10     ` Dr. David Alan Gilbert
  2015-06-18  8:28     ` Paolo Bonzini
  2015-06-26  6:46   ` Yang Hongyang
  2015-08-04  5:20   ` Amit Shah
  3 siblings, 2 replies; 209+ messages in thread
From: Li, Liang Z @ 2015-06-18  7:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel@nongnu.org
  Cc: aarcange@redhat.com, yamahata@private.email.ne.jp,
	quintela@redhat.com, luis@cs.umu.se, amit.shah@redhat.com,
	pbonzini@redhat.com, david@gibson.dropbear.id.au

> diff --git a/docs/migration.txt b/docs/migration.txt index f6df4be..b4b93d1
> 100644
> --- a/docs/migration.txt
> +++ b/docs/migration.txt
> @@ -291,3 +291,170 @@ save/send this state when we are in the middle of a
> pio operation  (that is what ide_drive_pio_state_needed() checks).  If
> DRQ_STAT is  not enabled, the values on that fields are garbage and don't
> need to  be sent.
> +
> += Return path =
> +
> +In most migration scenarios there is only a single data path that runs
> +from the source VM to the destination, typically along a single fd
> +(although possibly with another fd or similar for some fast way of throwing
> pages across).
> +
> +However, some uses need two way communication; in particular the
> +Postcopy destination needs to be able to request pages on demand from
> the source.
> +
> +For these scenarios there is a 'return path' from the destination to
> +the source;
> +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for
> +the return path.
> +
> +  Source side
> +     Forward path - written by migration thread
> +     Return path  - opened by main thread, read by return-path thread
> +
> +  Destination side
> +     Forward path - read by main thread
> +     Return path  - opened by main thread, written by main thread AND
> postcopy
> +                    thread (protected by rp_mutex)
> +
> += Postcopy =
> +'Postcopy' migration is a way to deal with migrations that refuse to
> +converge; its plus side is that there is an upper bound on the amount
> +of migration traffic and time it takes, the down side is that during
> +the postcopy phase, a failure of
> +*either* side or the network connection causes the guest to be lost.

Hi David,

Do you have any idea or plan to deal with the failure happened during the postcopy phase?

Lost the guest  is too frightening for a cloud provider, we have a discussion with 
Alibaba, they said that they can't use the postcopy feature unless there is a mechanism to
find the guest back.

Liang

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-18  7:50   ` Li, Liang Z
@ 2015-06-18  8:10     ` Dr. David Alan Gilbert
  2015-06-18  8:28     ` Paolo Bonzini
  1 sibling, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-18  8:10 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: aarcange@redhat.com, yamahata@private.email.ne.jp,
	quintela@redhat.com, qemu-devel@nongnu.org, luis@cs.umu.se,
	amit.shah@redhat.com, pbonzini@redhat.com,
	david@gibson.dropbear.id.au

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > diff --git a/docs/migration.txt b/docs/migration.txt index f6df4be..b4b93d1
> > 100644
> > --- a/docs/migration.txt
> > +++ b/docs/migration.txt
> > @@ -291,3 +291,170 @@ save/send this state when we are in the middle of a
> > pio operation  (that is what ide_drive_pio_state_needed() checks).  If
> > DRQ_STAT is  not enabled, the values on that fields are garbage and don't
> > need to  be sent.
> > +
> > += Return path =
> > +
> > +In most migration scenarios there is only a single data path that runs
> > +from the source VM to the destination, typically along a single fd
> > +(although possibly with another fd or similar for some fast way of throwing
> > pages across).
> > +
> > +However, some uses need two way communication; in particular the
> > +Postcopy destination needs to be able to request pages on demand from
> > the source.
> > +
> > +For these scenarios there is a 'return path' from the destination to
> > +the source;
> > +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for
> > +the return path.
> > +
> > +  Source side
> > +     Forward path - written by migration thread
> > +     Return path  - opened by main thread, read by return-path thread
> > +
> > +  Destination side
> > +     Forward path - read by main thread
> > +     Return path  - opened by main thread, written by main thread AND
> > postcopy
> > +                    thread (protected by rp_mutex)
> > +
> > += Postcopy =
> > +'Postcopy' migration is a way to deal with migrations that refuse to
> > +converge; its plus side is that there is an upper bound on the amount
> > +of migration traffic and time it takes, the down side is that during
> > +the postcopy phase, a failure of
> > +*either* side or the network connection causes the guest to be lost.
> 
> Hi David,
> 
> Do you have any idea or plan to deal with the failure happened during the postcopy phase?
> 
> Lost the guest  is too frightening for a cloud provider, we have a discussion with 
> Alibaba, they said that they can't use the postcopy feature unless there is a mechanism to
> find the guest back.

The VM memory image is still on the source VM, so you can restart
the source, however that's not safe, because once the destination has
started running it is sending out packets and also modifying the block storage.
If you restarted the source at that point what block and net state can
you accept being visible?

Dave

> 
> Liang
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-18  7:50   ` Li, Liang Z
  2015-06-18  8:10     ` Dr. David Alan Gilbert
@ 2015-06-18  8:28     ` Paolo Bonzini
  2015-06-19 17:52       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Paolo Bonzini @ 2015-06-18  8:28 UTC (permalink / raw)
  To: Li, Liang Z, Dr. David Alan Gilbert (git), qemu-devel@nongnu.org
  Cc: aarcange@redhat.com, yamahata@private.email.ne.jp,
	quintela@redhat.com, luis@cs.umu.se, amit.shah@redhat.com,
	david@gibson.dropbear.id.au



On 18/06/2015 09:50, Li, Liang Z wrote:
> Do you have any idea or plan to deal with the failure happened during
> the postcopy phase?
> 
> Lost the guest  is too frightening for a cloud provider, we have a
> discussion with Alibaba, they said that they can't use the postcopy
> feature unless there is a mechanism to find the guest back.

There's no solution to this problem, except for rollback to a previous
snapshot.

To give an idea, an example of an intended usecase for postcopy is
datacenter evacuation in 30 minutes after a tsunami alert.  That's not a
case where you care much about losing guests to network failures.

Why is there no solution?  Let's look at one of the best surveys on
migration,
http://courses.cs.vt.edu/~cs5204/fall05-kafura/Papers/Migration/ProcessMigration.pdf
(warning, 59 pages!):

  [3.2] If only part of the task state is transferred to another node,
  the task can start executing sooner, and the initial migration costs
  are lower.

  [3.4] Fault resilience can be improved in several ways. The impact of
  failures during migration can be reduced by maintaining process state
  on both the source and destination sites until the destination site
  instance is successfully promoted to a regular process and the source
  node is informed about this.

  [3.5] Migration algorithms should avoid linear dependencies on the
  amount of state to be transferred. For example, the eager data
  transfer strategy has costs proportional to the address space size

"Pre"copy means "start copying *before* promoting the destination to be
the primary host" and it has such a linear dependency on the amount of
state to be transferred. "Post"copy means "delay some copying to *after*
promoting the destination to be the primary host".

So we have:

                           Precopy            Postcopy
   3.2 Performance            - (1)             - (2)
   3.4 Fault resilience       +                 -
   3.5 Scalability            -                 +

      (1) smaller impact, longer freeze time
      (2) larger impact, extremely short freeze time

Postcopy can also limit the length of the non-resilient phase, by
starting with a precopy phase and only switching to postcopy after some
time.  Then you have:

                           Precopy        Hybrid      Postcopy
   3.2 Performance            - (1)          + (3)        - (2)
   3.4 Fault resilience       +              -            --
   3.5 Scalability            -              +            +

      (3) intermediate impact, extremely short freeze time

but there is still going to be a phase where migration is not resilient
to network faults.

Cloud operators can use a combination of precopy and postcopy.  For
example, I would not use postcopy for mass migration when doing
host updates, but it can be used as a last resort before a scheduled
downtime.

For example, say you're doing a rolling update and you want it complete
by next Sunday.  90% of the guests are shut down by the customers or can
be migrated successfully with precopy.  The others do not converge and
their SLA does not let you throttle them to complete precopy migration.

You then tell your customers that either they shutdown and restart their
instances before Saturday 8:00 PM, or they might be shut down forcibly.
 Then for customers who haven't rebooted you can do
postcopy---you have alerted them that something might go wrong.  So even
though postcopy would not be a first choice, it can still help cloud
operators.

Paolo

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-06-17 12:17   ` Juan Quintela
@ 2015-06-19 17:04     ` Dr. David Alan Gilbert
  2015-07-13 10:15       ` Juan Quintela
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-19 17:04 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Useful for debugging the migration bitmap and other bitmaps
> > of the same format (including the sentmap in postcopy).
> >
> > The bitmap is printed to stderr.
> > Lines that are all the expected value are excluded so the output
> > can be quite compact for many bitmaps.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |  1 +
> >  migration/ram.c               | 38 ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 39 insertions(+)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 9387c8c..b3a7f75 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -144,6 +144,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
> >  double xbzrle_mig_cache_miss_rate(void);
> >  
> >  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
> > +void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
> >  
> >  /**
> >   * @migrate_add_blocker - prevent migration from proceeding
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 57368e1..efc215a 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1051,6 +1051,44 @@ static void reset_ram_globals(void)
> >  
> >  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> >  
> > +/*
> > + * 'expected' is the value you expect the bitmap mostly to be full
> > + * of; it won't bother printing lines that are all this value.
> > + * If 'todump' is null the migration bitmap is dumped.
> > + */
> > +void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
> > +{
> > +    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> > +
> > +    int64_t cur;
> > +    int64_t linelen = 128;
> > +    char linebuf[129];
> > +
> > +    if (!todump) {
> > +        todump = migration_bitmap;
> > +    }
> 
> Why?  Just alssert that todump!= NULL?

'migration_bitmap' is static to ram.c, so allowing NULL to get
you a dump of the migration_bitmap means that if you call this
dump routine from any error path you're debugging anywhere in qemu
then you can dump the migration bitmap.  e.g. I was adding calls
to this in migration.c and migration/postcopy-ram.c in error
paths I was trying to debug.

> > +    for (cur = 0; cur < ram_pages; cur += linelen) {
> > +        int64_t curb;
> > +        bool found = false;
> > +        /*
> > +         * Last line; catch the case where the line length
> > +         * is longer than remaining ram
> > +         */
> > +        if (cur + linelen > ram_pages) {
> > +            linelen = ram_pages - cur;
> > +        }
> > +        for (curb = 0; curb < linelen; curb++) {
> > +            bool thisbit = test_bit(cur + curb, todump);
> > +            linebuf[curb] = thisbit ? '1' : '.';
> 
> Put 1 and 0?  Why the dot?

It's easier to see an occasional '1' in a big field of .'s.

> > +            found = found || (thisbit != expected);
> > +        }
> > +        if (found) {
> > +            linebuf[curb] = '\0';
> > +            fprintf(stderr,  "0x%08" PRIx64 " : %s\n", cur, linebuf);
> > +        }
> > +    }
> > +}
> 
> 
> And once here, why are we doing it this way?  We have
> 
> find_first_bit(addr, nbits) and find_first_zero_bit(addr, nbits) and
> friends?
> 
> Doiwg the walk by hand looks weird, no?

Here's a compile-tested-only version using find_  - it's bigger, if you think
it's better I can use this instead:

void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
{
    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;

    int64_t cur;
    int64_t linelen = 128;

    if (!todump) {
        todump = migration_bitmap;
    }

    for (cur = 0; cur < ram_pages; cur += linelen) {
        int64_t curb;
        unsigned long next_bit;

        /*
         * Last line; catch the case where the line length
         * is longer than remaining ram
         */
        if (cur + linelen > ram_pages) {
            linelen = ram_pages - cur;
        }
        if (expected) {
            next_bit = find_next_bit(todump, cur + linelen, cur);
        } else {
            next_bit = find_next_zero_bit(todump, cur + linelen, cur);
        }
        if (next_bit >= (cur + linelen)) {
            continue;
        }

        for (curb = 0; curb < linelen; curb++) {
            bool thisbit = test_bit(cur + curb, todump);
            fputc(thisbit ? '1' : '.', stderr);
        }
        fputc('\n', stderr);
    }
}

Dave


> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2015-06-17 12:28   ` Juan Quintela
@ 2015-06-19 17:18     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-19 17:18 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > The destination sets the fd to non-blocking on incoming migrations;
> > this also affects the return path from the destination, and thus we
> > need to make sure we can safely write to the return path.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Thanks,

> >  migration/qemu-file-unix.c | 41 ++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 36 insertions(+), 5 deletions(-)
> >
> > diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
> > index 561621e..b6c55ab 100644
> > --- a/migration/qemu-file-unix.c
> > +++ b/migration/qemu-file-unix.c
> > @@ -39,12 +39,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt,
> >      QEMUFileSocket *s = opaque;
> >      ssize_t len;
> >      ssize_t size = iov_size(iov, iovcnt);
> > +    ssize_t offset = 0;
> > +    int     err;
> >  
> > -    len = iov_send(s->fd, iov, iovcnt, 0, size);
> > -    if (len < size) {
> > -        len = -socket_error();
> > -    }
> > -    return len;
> > +    while (size > 0) {
> > +        len = iov_send(s->fd, iov, iovcnt, offset, size);
> > +
> > +        if (len > 0) {
> > +            size -= len;
> > +            offset += len;
> > +        }
> 
> ion_send() can return -1 on error
> 
> This looks a "hacky way" to look for it, althoght my understanding is
> that it is correct, so the review-by

I have to check the socket_error in two cases, one where
len == -1 , and the other where it's only sent some of the data.
Given I have to do it in both cases it doesn't seme worth explicitly
checking for the -1.

Dave

> > +
> > +        if (size > 0) {
> > +            err = socket_error();
> > +
> > +            if (err != EAGAIN && err != EWOULDBLOCK) {
> > +                error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
> > +                             err, size, len);
> > +                /*
> > +                 * If I've already sent some but only just got the error, I
> > +                 * could return the amount validly sent so far and wait for the
> > +                 * next call to report the error, but I'd rather flag the error
> > +                 * immediately.
> > +                 */
> > +                return -err;
> > +            }
> > +
> > +            /* Emulate blocking */
> > +            GPollFD pfd;
> > +
> > +            pfd.fd = s->fd;
> > +            pfd.events = G_IO_OUT | G_IO_ERR;
> > +            pfd.revents = 0;
> > +            g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
> > +        }
> > +     }
> > +
> > +    return offset;
> >  }
> >  
> >  static int socket_get_fd(void *opaque)
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/42] Migration commands
  2015-06-17 12:31   ` Juan Quintela
@ 2015-06-19 17:38     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-19 17:38 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Create QEMU_VM_COMMAND section type for sending commands from
> > source to destination.  These commands are not intended to convey
> > guest state but to control the migration process.
> >
> > For use in postcopy.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |  1 +
> >  include/sysemu/sysemu.h       |  7 +++++++
> >  migration/savevm.c            | 46 +++++++++++++++++++++++++++++++++++++++++++
> >  trace-events                  |  1 +
> >  4 files changed, 55 insertions(+)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 414c5cf..8adaa45 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -34,6 +34,7 @@
> >  #define QEMU_VM_SECTION_FULL         0x04
> >  #define QEMU_VM_SUBSECTION           0x05
> >  #define QEMU_VM_VMDESCRIPTION        0x06
> > +#define QEMU_VM_COMMAND              0x07
> 
> conflicts with configuration section just sent

Inc'd to 8.

> >  #define QEMU_VM_SECTION_FOOTER       0x7e
> >  
> >  struct MigrationParams {
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 6dae2db..5869607 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -82,6 +82,11 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
> >  
> >  void qemu_announce_self(void);
> >  
> > +/* Subcommands for QEMU_VM_COMMAND */
> > +enum qemu_vm_cmd {
> > +    MIG_CMD_INVALID = 0,   /* Must be 0 */
> > +};
> > +
> >  bool qemu_savevm_state_blocked(Error **errp);
> >  void qemu_savevm_state_begin(QEMUFile *f,
> >                               const MigrationParams *params);
> > @@ -90,6 +95,8 @@ int qemu_savevm_state_iterate(QEMUFile *f);
> >  void qemu_savevm_state_complete_precopy(QEMUFile *f);
> >  void qemu_savevm_state_cancel(void);
> >  uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
> > +void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd command,
> > +                              uint16_t len, uint8_t *data);
> >  int qemu_loadvm_state(QEMUFile *f);
> >  
> >  typedef enum DisplayType
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index f9168ac..7ce9d21 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -686,6 +686,23 @@ static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
> >      return true;
> >  }
> >  
> > +/* Send a 'QEMU_VM_COMMAND' type element with the command
> > + * and associated data.
> 
> Please, use doxygen comments for new functions :p

Done.

> 
> > + */
> > +void qemu_savevm_command_send(QEMUFile *f,
> > +                              enum qemu_vm_cmd command,
> > +                              uint16_t len,
> > +                              uint8_t *data)
> > +{
> > +    qemu_put_byte(f, QEMU_VM_COMMAND);
> > +    qemu_put_be16(f, (uint16_t)command);
> > +    qemu_put_be16(f, len);
> > +    if (len) {
> > +        qemu_put_buffer(f, data, len);
> > +    }
> > +    qemu_fflush(f);
> > +}
> > +
> >  bool qemu_savevm_state_blocked(Error **errp)
> >  {
> >      SaveStateEntry *se;
> > @@ -982,6 +999,29 @@ static SaveStateEntry *find_se(const char *idstr, int instance_id)
> >      return NULL;
> >  }
> >  
> > +/*
> > + * Process an incoming 'QEMU_VM_COMMAND'
> > + * negative return on error (will issue error message)
> > + */
> > +static int loadvm_process_command(QEMUFile *f)
> > +{
> > +    uint16_t cmd;
> > +    uint16_t len;
> > +
> > +    cmd = qemu_get_be16(f);
> > +    len = qemu_get_be16(f);
> > +
> > +    trace_loadvm_process_command(cmd, len);
> 
> trace for load but not for sending?

Added.

Dave

> > +    switch (cmd) {
> > +
> > +    default:
> > +        error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  struct LoadStateEntry {
> >      QLIST_ENTRY(LoadStateEntry) entry;
> >      SaveStateEntry *se;
> > @@ -1114,6 +1154,12 @@ int qemu_loadvm_state(QEMUFile *f)
> >                  goto out;
> >              }
> >              break;
> > +        case QEMU_VM_COMMAND:
> > +            ret = loadvm_process_command(f);
> > +            if (ret < 0) {
> > +                goto out;
> > +            }
> > +            break;
> >          default:
> >              error_report("Unknown savevm section type %d", section_type);
> >              ret = -EINVAL;
> > diff --git a/trace-events b/trace-events
> > index d539528..73a65c3 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1183,6 +1183,7 @@ virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64
> >  qemu_loadvm_state_section(unsigned int section_type) "%d"
> >  qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
> >  qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
> > +loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
> >  savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
> >  savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
> >  savevm_state_begin(void) ""
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-18  8:28     ` Paolo Bonzini
@ 2015-06-19 17:52       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-19 17:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: aarcange@redhat.com, yamahata@private.email.ne.jp,
	quintela@redhat.com, Li, Liang Z, qemu-devel@nongnu.org,
	luis@cs.umu.se, amit.shah@redhat.com, david@gibson.dropbear.id.au

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 18/06/2015 09:50, Li, Liang Z wrote:
> > Do you have any idea or plan to deal with the failure happened during
> > the postcopy phase?
> > 
> > Lost the guest  is too frightening for a cloud provider, we have a
> > discussion with Alibaba, they said that they can't use the postcopy
> > feature unless there is a mechanism to find the guest back.
> 
> There's no solution to this problem, except for rollback to a previous
> snapshot.

Yes, and you might be able to avoid some of the pain if you COWd the
disk data on the destination until the migration was finished; that would
allow you to restart the source VM in the state prior to postcopy starting;
although the network's view of it is going to be very messy.

> To give an idea, an example of an intended usecase for postcopy is
> datacenter evacuation in 30 minutes after a tsunami alert.  That's not a
> case where you care much about losing guests to network failures.

Well; you have to make a call as to what your best option is;  you could
always shut the VM down and boot it up fresh in your new safe data centre.
Your preference is determined by your confidence that your VM would boot
back up safely and how long it would take and the confidence in that network
during the migration period and the pain of knowing what will happen
if you explicitly shut the VM down.

> Cloud operators can use a combination of precopy and postcopy.  For
> example, I would not use postcopy for mass migration when doing
> host updates, but it can be used as a last resort before a scheduled
> downtime.
> 
> For example, say you're doing a rolling update and you want it complete
> by next Sunday.  90% of the guests are shut down by the customers or can
> be migrated successfully with precopy.  The others do not converge and
> their SLA does not let you throttle them to complete precopy migration.

Indeed the interface lets you do that pretty easily; since as long as you
have enabled postcopy, it starts in precopy mode and is fully recoverable
until you issue the 'migrate_start_postcopy' which might be when it's
tried 'n' times and you can see that the workload you have isn't going
to converge.

Dave

> You then tell your customers that either they shutdown and restart their
> instances before Saturday 8:00 PM, or they might be shut down forcibly.
>  Then for customers who haven't rebooted you can do
> postcopy---you have alerted them that something might go wrong.  So even
> though postcopy would not be a first choice, it can still help cloud
> operators.
> 
> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source
  2015-06-17 16:30   ` Juan Quintela
@ 2015-06-19 18:42     ` Dr. David Alan Gilbert
  2015-07-01  9:29       ` Juan Quintela
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-19 18:42 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Add migrate_send_rp_message to send a message from destination to source along the return path.
> >   (It uses a mutex to let it be called from multiple threads)
> > Add migrate_send_rp_shut to send a 'shut' message to indicate
> >   the destination is finished with the RP.
> > Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
> >   Use it in the MSG_RP_PING handler
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h | 17 ++++++++++++++++
> >  migration/migration.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
> >  migration/savevm.c            |  2 +-
> >  trace-events                  |  1 +
> >  4 files changed, 64 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 65fe5db..36caab9 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -42,12 +42,20 @@ struct MigrationParams {
> >      bool shared;
> >  };
> >  
> > +/* Messages sent on the return path from destination to source */
> > +enum mig_rp_message_type {
> > +    MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
> > +    MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
> > +    MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
> > +};
> > +
> >  typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
> >  /* State for the incoming migration */
> >  struct MigrationIncomingState {
> >      QEMUFile *file;
> >  
> >      QEMUFile *return_path;
> > +    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> >  
> >      /* See savevm.c */
> >      LoadStateEntry_Head loadvm_handlers;
> > @@ -179,6 +187,15 @@ int migrate_compress_level(void);
> >  int migrate_compress_threads(void);
> >  int migrate_decompress_threads(void);
> >  
> > +/* Sending on the return path - generic and then for each message type */
> > +void migrate_send_rp_message(MigrationIncomingState *mis,
> > +                             enum mig_rp_message_type message_type,
> > +                             uint16_t len, void *data);
> > +void migrate_send_rp_shut(MigrationIncomingState *mis,
> > +                          uint32_t value);
> > +void migrate_send_rp_pong(MigrationIncomingState *mis,
> > +                          uint32_t value);
> > +
> >  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
> >  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> >  void ram_control_load_hook(QEMUFile *f, uint64_t flags);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 295f15a..afb19a1 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -85,6 +85,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
> >      mis_current = g_malloc0(sizeof(MigrationIncomingState));
> >      mis_current->file = f;
> >      QLIST_INIT(&mis_current->loadvm_handlers);
> > +    qemu_mutex_init(&mis_current->rp_mutex);
> >  
> >      return mis_current;
> >  }
> > @@ -182,6 +183,50 @@ void process_incoming_migration(QEMUFile *f)
> >      qemu_coroutine_enter(co, f);
> >  }
> >  
> > +/*
> > + * Send a message on the return channel back to the source
> > + * of the migration.
> > + */
> > +void migrate_send_rp_message(MigrationIncomingState *mis,
> > +                             enum mig_rp_message_type message_type,
> > +                             uint16_t len, void *data)
> > +{
> > +    trace_migrate_send_rp_message((int)message_type, len);
> > +    qemu_mutex_lock(&mis->rp_mutex);
> > +    qemu_put_be16(mis->return_path, (unsigned int)message_type);
> > +    qemu_put_be16(mis->return_path, len);
> if (len) {
> 
> > +    qemu_put_buffer(mis->return_path, data, len);
> }
> 
> 
> ?
> 
> We check for zero sized command on control commands but not on
> responses?

Or should I remove the check in the control commands case?
qemu_put_buffer looks like it's safe for size == 0

Dave

> 
> > +    qemu_fflush(mis->return_path);
> > +    qemu_mutex_unlock(&mis->rp_mutex);
> > +}
> > +
> > +/*
> > + * Send a 'SHUT' message on the return channel with the given value
> > + * to indicate that we've finished with the RP.  Non-0 value indicates
> > + * error.
> > + */
> > +void migrate_send_rp_shut(MigrationIncomingState *mis,
> > +                          uint32_t value)
> > +{
> > +    uint32_t buf;
> > +
> > +    buf = cpu_to_be32(value);
> > +    migrate_send_rp_message(mis, MIG_RP_MSG_SHUT, sizeof(buf), &buf);
> > +}
> > +
> > +/*
> > + * Send a 'PONG' message on the return channel with the given value
> > + * (normally in response to a 'PING')
> > + */
> > +void migrate_send_rp_pong(MigrationIncomingState *mis,
> > +                          uint32_t value)
> > +{
> > +    uint32_t buf;
> > +
> > +    buf = cpu_to_be32(value);
> > +    migrate_send_rp_message(mis, MIG_RP_MSG_PONG, sizeof(buf), &buf);
> > +}
> > +
> >  /* amount of nanoseconds we are willing to wait for migration to be down.
> >   * the choice of nanoseconds is because it is the maximum resolution that
> >   * get_clock() can achieve. It is an internal measure. All user-visible
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index a995014..d424c2a 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1071,7 +1071,7 @@ static int loadvm_process_command(QEMUFile *f)
> >                           tmp32);
> >              return -1;
> >          }
> > -        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
> > +        migrate_send_rp_pong(mis, tmp32);
> >          break;
> >  
> >      default:
> > diff --git a/trace-events b/trace-events
> > index 5967fdf..5738e3f 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1399,6 +1399,7 @@ migrate_fd_cleanup(void) ""
> >  migrate_fd_error(void) ""
> >  migrate_fd_cancel(void) ""
> >  migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
> > +migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
> >  migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
> >  
> >  # migration/rdma.c
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/42] Return path: Control commands
  2015-06-17 12:49   ` Juan Quintela
@ 2015-06-23 18:57     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-23 18:57 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Add two src->dest commands:
> >    * OPEN_RETURN_PATH - To request that the destination open the return path
> >    * PING - Request an acknowledge from the destination
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/migration/migration.h |  2 ++
> >  include/sysemu/sysemu.h       |  6 ++++-
> >  migration/savevm.c            | 60 +++++++++++++++++++++++++++++++++++++++++++
> >  trace-events                  |  2 ++
> >  4 files changed, 69 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 8adaa45..65fe5db 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -47,6 +47,8 @@ typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
> >  struct MigrationIncomingState {
> >      QEMUFile *file;
> >  
> > +    QEMUFile *return_path;
> > +
> >      /* See savevm.c */
> >      LoadStateEntry_Head loadvm_handlers;
> >  };
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 5869607..d8875ca 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -84,7 +84,9 @@ void qemu_announce_self(void);
> >  
> >  /* Subcommands for QEMU_VM_COMMAND */
> >  enum qemu_vm_cmd {
> > -    MIG_CMD_INVALID = 0,   /* Must be 0 */
> > +    MIG_CMD_INVALID = 0,       /* Must be 0 */
> > +    MIG_CMD_OPEN_RETURN_PATH,  /* Tell the dest to open the Return path */
> > +    MIG_CMD_PING,              /* Request a PONG on the RP */
> 
> Add    MIG_CMD_MAX
>     
> 
> struct cmd_args {
>        int len;
>        char *name;
> } cmd_args[] ={
> .[MIG_CMD_INVALID] = {
>    .len = 0,
>    .name = "CMD_INVALID"},
> .[MIG_CMD_OPEN_RETURN_PATH] = {
>    .len = 0,
>    .name = "CMD_OPEN_RETURN_PATH"},
> .....
> }
> }
> 
> I have done the initialization by hand, not sure if syntax is ok.

Pretty close, no '.] before the [] - we end up with (at the end of the series):

static struct mig_cmd_args {
    ssize_t     len; /* -1 = variable */
    const char *name;
} mig_cmd_args[] = {
    [MIG_CMD_INVALID]          = { .len = -1, .name = "INVALID" },
    [MIG_CMD_OPEN_RETURN_PATH] = { .len =  0, .name = "OPEN_RETURN_PATH" },
    [MIG_CMD_PING]             = { .len = sizeof(uint32_t), .name = "PING" },
    [MIG_CMD_POSTCOPY_ADVISE]  = { .len = 16, .name = "POSTCOPY_ADVISE" },
    [MIG_CMD_POSTCOPY_LISTEN]  = { .len =  0, .name = "POSTCOPY_LISTEN" },
    [MIG_CMD_POSTCOPY_RUN]     = { .len =  0, .name = "POSTCOPY_RUN" },
    [MIG_CMD_POSTCOPY_RAM_DISCARD] =
                                 { .len = -1, .name = "POSTCOPY_RAM_DISCARD" },
    [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
};

> >  static int loadvm_process_command(QEMUFile *f)
> >  {
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> >      uint16_t cmd;
> >      uint16_t len;
> > +    uint32_t tmp32;
> >  
> >      cmd = qemu_get_be16(f);
> >      len = qemu_get_be16(f);
> >  
> >      trace_loadvm_process_command(cmd, len);
> 
> /* yes, this should go in previous patch */

Yes, done.

> if (cmd > MIG_CMD_MAX ) {
>             error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
>      .....
> }
> 
> if (loadvm_process_command_simple_lencheck(cmd_args[cmd].name, len,
> cmd_args[cmd].len) {
>                    return -1;
> }
> 
> switch (cmd) {
> case MIG_CMD_OPEN_RETURN_PATH:
>         if (mis->return_path) {
>             error_report("CMD_OPEN_RETURN_PATH called when RP already open");
>             /* Not really a problem, so don't give up */
>             return 0;
>         }
>         mis->return_path = qemu_file_get_return_path(f);
>         if (!mis->return_path) {
>             error_report("CMD_OPEN_RETURN_PATH failed");
>             return -1;
>         }
>         break;
> 
> .....
> }
> 
> You get the idea.  I normally even put the code from the command in a
> function pointer, but I know that other people don't like it.
> This way you factor the common code at the beginning of the function.
> 
> What do you think?

Yep, done.

Dave

> 
> 
> >      switch (cmd) {
> > +    case MIG_CMD_OPEN_RETURN_PATH:
> > +        if (loadvm_process_command_simple_lencheck("CMD_OPEN_RETURN_PATH",
> > +                                                   len, 0)) {
> > +            return -1;
> > +        }
> > +        if (mis->return_path) {
> > +            error_report("CMD_OPEN_RETURN_PATH called when RP already open");
> > +            /* Not really a problem, so don't give up */
> > +            return 0;
> > +        }
> > +        mis->return_path = qemu_file_get_return_path(f);
> > +        if (!mis->return_path) {
> > +            error_report("CMD_OPEN_RETURN_PATH failed");
> > +            return -1;
> > +        }
> > +        break;
> > +
> > +    case MIG_CMD_PING:
> > +        if (loadvm_process_command_simple_lencheck("CMD_PING", len,
> > +                                                   sizeof(tmp32))) {
> > +            return -1;
> > +        }
> > +        tmp32 = qemu_get_be32(f);
> > +        trace_loadvm_process_command_ping(tmp32);
> > +        if (!mis->return_path) {
> > +            error_report("CMD_PING (0x%x) received with no return path",
> > +                         tmp32);
> > +            return -1;
> > +        }
> > +        /* migrate_send_rp_pong(mis, tmp32); TODO: gets added later */
> > +        break;
> >  
> >      default:
> >          error_report("VM_COMMAND 0x%x unknown (len 0x%x)", cmd, len);
> > diff --git a/trace-events b/trace-events
> > index 73a65c3..5967fdf 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1184,8 +1184,10 @@ qemu_loadvm_state_section(unsigned int section_type) "%d"
> >  qemu_loadvm_state_section_partend(uint32_t section_id) "%u"
> >  qemu_loadvm_state_section_startfull(uint32_t section_id, const char *idstr, uint32_t instance_id, uint32_t version_id) "%u(%s) %u %u"
> >  loadvm_process_command(uint16_t com, uint16_t len) "com=0x%x len=%d"
> > +loadvm_process_command_ping(uint32_t val) "%x"
> >  savevm_section_start(const char *id, unsigned int section_id) "%s, section_id %u"
> >  savevm_section_end(const char *id, unsigned int section_id, int ret) "%s, section_id %u -> %d"
> > +savevm_send_ping(uint32_t val) "%x"
> >  savevm_state_begin(void) ""
> >  savevm_state_header(void) ""
> >  savevm_state_iterate(void) ""
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works Dr. David Alan Gilbert (git)
  2015-06-17 11:42   ` Juan Quintela
  2015-06-18  7:50   ` Li, Liang Z
@ 2015-06-26  6:46   ` Yang Hongyang
  2015-06-26  7:53     ` zhanghailiang
  2015-08-04  5:20   ` Amit Shah
  3 siblings, 1 reply; 209+ messages in thread
From: Yang Hongyang @ 2015-06-26  6:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, luis, amit.shah,
	pbonzini, david

Hi Dave,

On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
[...]
> += Postcopy =
> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
> +its plus side is that there is an upper bound on the amount of migration traffic
> +and time it takes, the down side is that during the postcopy phase, a failure of
> +*either* side or the network connection causes the guest to be lost.
> +
> +In postcopy the destination CPUs are started before all the memory has been
> +transferred, and accesses to pages that are yet to be transferred cause
> +a fault that's translated by QEMU into a request to the source QEMU.

I have a immature idea,
Can we keep a source RAM cache on destination QEMU, instead of request to the
source QEMU, that is:
  - When start_postcopy issued, source will paused, and __open another socket
    (maybe another migration thread)__ to send the remaining dirty pages to
    destination, at the same time, destination will start, and cache the
    remaining pages.
  - When the page fault occured, first lookup the page in the CACHE, if it is not
    yet received, request to the source QEMU.
  - Once the remaining dirty pages are transfered, the source QEMU can go now.

The existing postcopy mechanism does not need to be changed, just add the
remaining page transfer mechanism, and the RAM cache.

I don't know if it is feasible and whether it will bring improvement to the
postcopy, what do you think?

> +
> +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
> +doesn't finish in a given time the switch is made to postcopy.
> +
> +=== Enabling postcopy ===
> +
> +To enable postcopy (prior to the start of migration):
> +
> +migrate_set_capability x-postcopy-ram on
> +
> +The migration will still start in precopy mode, however issuing:
> +
> +migrate_start_postcopy
> +
> +will now cause the transition from precopy to postcopy.
> +It can be issued immediately after migration is started or any
> +time later on.  Issuing it after the end of a migration is harmless.
> +
> +=== Postcopy device transfer ===
> +
> +Loading of device data may cause the device emulation to access guest RAM
> +that may trigger faults that have to be resolved by the source, as such
> +the migration stream has to be able to respond with page data *during* the
> +device load, and hence the device data has to be read from the stream completely
> +before the device load begins to free the stream up.  This is achieved by
> +'packaging' the device data into a blob that's read in one go.
> +
> +Source behaviour
> +
> +Until postcopy is entered the migration stream is identical to normal
> +precopy, except for the addition of a 'postcopy advise' command at
> +the beginning, to tell the destination that postcopy might happen.
> +When postcopy starts the source sends the page discard data and then
> +forms the 'package' containing:
> +
> +   Command: 'postcopy listen'
> +   The device state
> +      A series of sections, identical to the precopy streams device state stream
> +      containing everything except postcopiable devices (i.e. RAM)
> +   Command: 'postcopy run'
> +
> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
> +contents are formatted in the same way as the main migration stream.
> +
> +Destination behaviour
> +
> +Initially the destination looks the same as precopy, with a single thread
> +reading the migration stream; the 'postcopy advise' and 'discard' commands
> +are processed to change the way RAM is managed, but don't affect the stream
> +processing.
> +
> +------------------------------------------------------------------------------
> +                        1      2   3     4 5                      6   7
> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
> +thread                             |       |
> +                                   |     (page request)
> +                                   |        \___
> +                                   v            \
> +listen thread:                     --- page -- page -- page -- page -- page --
> +
> +                                   a   b        c
> +------------------------------------------------------------------------------
> +
> +On receipt of CMD_PACKAGED (1)
> +   All the data associated with the package - the ( ... ) section in the
> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
> +which contains commands (3,6) and devices (4...)
> +
> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
> +a new thread (a) is started that takes over servicing the migration stream,
> +while the main thread carries on loading the package.   It loads normal
> +background page data (b) but if during a device load a fault happens (5) the
> +returned page (c) is loaded by the listen thread allowing the main threads
> +device load to carry on.
> +
> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
> +CPUs start running.
> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
> +and is no longer used by migration, while the listen thread carries
> +on servicing page data until the end of migration.
> +
> +=== Postcopy states ===
> +
> +Postcopy moves through a series of states (see postcopy_state) from
> +ADVISE->LISTEN->RUNNING->END
> +
> +  Advise: Set at the start of migration if postcopy is enabled, even
> +          if it hasn't had the start command; here the destination
> +          checks that its OS has the support needed for postcopy, and performs
> +          setup to ensure the RAM mappings are suitable for later postcopy.
> +          (Triggered by reception of POSTCOPY_ADVISE command)
> +
> +  Listen: The first command in the package, POSTCOPY_LISTEN, switches
> +          the destination state to Listen, and starts a new thread
> +          (the 'listen thread') which takes over the job of receiving
> +          pages off the migration stream, while the main thread carries
> +          on processing the blob.  With this thread able to process page
> +          reception, the destination now 'sensitises' the RAM to detect
> +          any access to missing pages (on Linux using the 'userfault'
> +          system).
> +
> +  Running: POSTCOPY_RUN causes the destination to synchronise all
> +          state and start the CPUs and IO devices running.  The main
> +          thread now finishes processing the migration package and
> +          now carries on as it would for normal precopy migration
> +          (although it can't do the cleanup it would do as it
> +          finishes a normal migration).
> +
> +  End: The listen thread can now quit, and perform the cleanup of migration
> +          state, the migration is now complete.
> +
> +=== Source side page maps ===
> +
> +The source side keeps two bitmaps during postcopy; 'the migration bitmap'
> +and 'sent map'.  The 'migration bitmap' is basically the same as in
> +the precopy case, and holds a bit to indicate that page is 'dirty' -
> +i.e. needs sending.  During the precopy phase this is updated as the CPU
> +dirties pages, however during postcopy the CPUs are stopped and nothing
> +should dirty anything any more.
> +
> +The 'sent map' is used for the transition to postcopy. It is a bitmap that
> +has a bit set whenever a page is sent to the destination, however during
> +the transition to postcopy mode it is masked against the migration bitmap
> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that
> +have been previously been sent but are now dirty again.  This masked
> +sentmap is sent to the destination which discards those now dirty pages
> +before starting the CPUs.
> +
> +Note that the contents of the sentmap are sacrificed during the calculation
> +of the discard set and thus aren't valid once in postcopy.  The dirtymap
> +is still valid and is used to ensure that no page is sent more than once.  Any
> +request for a page that has already been sent is ignored.  Duplicate requests
> +such as this can happen as a page is sent at about the same time the
> +destination accesses it.
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-26  6:46   ` Yang Hongyang
@ 2015-06-26  7:53     ` zhanghailiang
  2015-06-26  8:00       ` Yang Hongyang
  0 siblings, 1 reply; 209+ messages in thread
From: zhanghailiang @ 2015-06-26  7:53 UTC (permalink / raw)
  To: Yang Hongyang, Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, peter.huangpeng, luis,
	amit.shah, pbonzini, david

On 2015/6/26 14:46, Yang Hongyang wrote:
> Hi Dave,
>
> On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote:
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>
> [...]
>> += Postcopy =
>> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
>> +its plus side is that there is an upper bound on the amount of migration traffic
>> +and time it takes, the down side is that during the postcopy phase, a failure of
>> +*either* side or the network connection causes the guest to be lost.
>> +
>> +In postcopy the destination CPUs are started before all the memory has been
>> +transferred, and accesses to pages that are yet to be transferred cause
>> +a fault that's translated by QEMU into a request to the source QEMU.
>
> I have a immature idea,
> Can we keep a source RAM cache on destination QEMU, instead of request to the
> source QEMU, that is:
>   - When start_postcopy issued, source will paused, and __open another socket
>     (maybe another migration thread)__ to send the remaining dirty pages to
>     destination, at the same time, destination will start, and cache the
>     remaining pages.

Er, it seems that current implementation is just like what you described except the ram cache:
After switch to post-copy mode, the source side will send the remaining dirty pages as pre-copy.
Here it does not need any cache at all, it just places the dirty pages where it will be accessed.

>   - When the page fault occured, first lookup the page in the CACHE, if it is not
>     yet received, request to the source QEMU.
>   - Once the remaining dirty pages are transfered, the source QEMU can go now.
>
> The existing postcopy mechanism does not need to be changed, just add the
> remaining page transfer mechanism, and the RAM cache.
>
> I don't know if it is feasible and whether it will bring improvement to the
> postcopy, what do you think?
>
>> +
>> +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
>> +doesn't finish in a given time the switch is made to postcopy.
>> +
>> +=== Enabling postcopy ===
>> +
>> +To enable postcopy (prior to the start of migration):
>> +
>> +migrate_set_capability x-postcopy-ram on
>> +
>> +The migration will still start in precopy mode, however issuing:
>> +
>> +migrate_start_postcopy
>> +
>> +will now cause the transition from precopy to postcopy.
>> +It can be issued immediately after migration is started or any
>> +time later on.  Issuing it after the end of a migration is harmless.
>> +
>> +=== Postcopy device transfer ===
>> +
>> +Loading of device data may cause the device emulation to access guest RAM
>> +that may trigger faults that have to be resolved by the source, as such
>> +the migration stream has to be able to respond with page data *during* the
>> +device load, and hence the device data has to be read from the stream completely
>> +before the device load begins to free the stream up.  This is achieved by
>> +'packaging' the device data into a blob that's read in one go.
>> +
>> +Source behaviour
>> +
>> +Until postcopy is entered the migration stream is identical to normal
>> +precopy, except for the addition of a 'postcopy advise' command at
>> +the beginning, to tell the destination that postcopy might happen.
>> +When postcopy starts the source sends the page discard data and then
>> +forms the 'package' containing:
>> +
>> +   Command: 'postcopy listen'
>> +   The device state
>> +      A series of sections, identical to the precopy streams device state stream
>> +      containing everything except postcopiable devices (i.e. RAM)
>> +   Command: 'postcopy run'
>> +
>> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
>> +contents are formatted in the same way as the main migration stream.
>> +
>> +Destination behaviour
>> +
>> +Initially the destination looks the same as precopy, with a single thread
>> +reading the migration stream; the 'postcopy advise' and 'discard' commands
>> +are processed to change the way RAM is managed, but don't affect the stream
>> +processing.
>> +
>> +------------------------------------------------------------------------------
>> +                        1      2   3     4 5                      6   7
>> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
>> +thread                             |       |
>> +                                   |     (page request)
>> +                                   |        \___
>> +                                   v            \
>> +listen thread:                     --- page -- page -- page -- page -- page --
>> +
>> +                                   a   b        c
>> +------------------------------------------------------------------------------
>> +
>> +On receipt of CMD_PACKAGED (1)
>> +   All the data associated with the package - the ( ... ) section in the
>> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
>> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
>> +which contains commands (3,6) and devices (4...)
>> +
>> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
>> +a new thread (a) is started that takes over servicing the migration stream,
>> +while the main thread carries on loading the package.   It loads normal
>> +background page data (b) but if during a device load a fault happens (5) the
>> +returned page (c) is loaded by the listen thread allowing the main threads
>> +device load to carry on.
>> +
>> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
>> +CPUs start running.
>> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
>> +and is no longer used by migration, while the listen thread carries
>> +on servicing page data until the end of migration.
>> +
>> +=== Postcopy states ===
>> +
>> +Postcopy moves through a series of states (see postcopy_state) from
>> +ADVISE->LISTEN->RUNNING->END
>> +
>> +  Advise: Set at the start of migration if postcopy is enabled, even
>> +          if it hasn't had the start command; here the destination
>> +          checks that its OS has the support needed for postcopy, and performs
>> +          setup to ensure the RAM mappings are suitable for later postcopy.
>> +          (Triggered by reception of POSTCOPY_ADVISE command)
>> +
>> +  Listen: The first command in the package, POSTCOPY_LISTEN, switches
>> +          the destination state to Listen, and starts a new thread
>> +          (the 'listen thread') which takes over the job of receiving
>> +          pages off the migration stream, while the main thread carries
>> +          on processing the blob.  With this thread able to process page
>> +          reception, the destination now 'sensitises' the RAM to detect
>> +          any access to missing pages (on Linux using the 'userfault'
>> +          system).
>> +
>> +  Running: POSTCOPY_RUN causes the destination to synchronise all
>> +          state and start the CPUs and IO devices running.  The main
>> +          thread now finishes processing the migration package and
>> +          now carries on as it would for normal precopy migration
>> +          (although it can't do the cleanup it would do as it
>> +          finishes a normal migration).
>> +
>> +  End: The listen thread can now quit, and perform the cleanup of migration
>> +          state, the migration is now complete.
>> +
>> +=== Source side page maps ===
>> +
>> +The source side keeps two bitmaps during postcopy; 'the migration bitmap'
>> +and 'sent map'.  The 'migration bitmap' is basically the same as in
>> +the precopy case, and holds a bit to indicate that page is 'dirty' -
>> +i.e. needs sending.  During the precopy phase this is updated as the CPU
>> +dirties pages, however during postcopy the CPUs are stopped and nothing
>> +should dirty anything any more.
>> +
>> +The 'sent map' is used for the transition to postcopy. It is a bitmap that
>> +has a bit set whenever a page is sent to the destination, however during
>> +the transition to postcopy mode it is masked against the migration bitmap
>> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that
>> +have been previously been sent but are now dirty again.  This masked
>> +sentmap is sent to the destination which discards those now dirty pages
>> +before starting the CPUs.
>> +
>> +Note that the contents of the sentmap are sacrificed during the calculation
>> +of the discard set and thus aren't valid once in postcopy.  The dirtymap
>> +is still valid and is used to ensure that no page is sent more than once.  Any
>> +request for a page that has already been sent is ignored.  Duplicate requests
>> +such as this can happen as a page is sent at about the same time the
>> +destination accesses it.
>>
>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-26  7:53     ` zhanghailiang
@ 2015-06-26  8:00       ` Yang Hongyang
  2015-06-26  8:10         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 209+ messages in thread
From: Yang Hongyang @ 2015-06-26  8:00 UTC (permalink / raw)
  To: zhanghailiang, Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, peter.huangpeng, luis,
	amit.shah, pbonzini, david



On 06/26/2015 03:53 PM, zhanghailiang wrote:
> On 2015/6/26 14:46, Yang Hongyang wrote:
>> Hi Dave,
>>
>> On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>> [...]
>>> += Postcopy =
>>> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
>>> +its plus side is that there is an upper bound on the amount of migration
>>> traffic
>>> +and time it takes, the down side is that during the postcopy phase, a
>>> failure of
>>> +*either* side or the network connection causes the guest to be lost.
>>> +
>>> +In postcopy the destination CPUs are started before all the memory has been
>>> +transferred, and accesses to pages that are yet to be transferred cause
>>> +a fault that's translated by QEMU into a request to the source QEMU.
>>
>> I have a immature idea,
>> Can we keep a source RAM cache on destination QEMU, instead of request to the
>> source QEMU, that is:
>>   - When start_postcopy issued, source will paused, and __open another socket
>>     (maybe another migration thread)__ to send the remaining dirty pages to
>>     destination, at the same time, destination will start, and cache the
>>     remaining pages.
>
> Er, it seems that current implementation is just like what you described except
> the ram cache:
> After switch to post-copy mode, the source side will send the remaining dirty
> pages as pre-copy.
> Here it does not need any cache at all, it just places the dirty pages where it
> will be accessed.

I haven't look into the implementation in detail, but if it is, I think it
should be documented here...or in the below section [Source behaviour]

>
>>   - When the page fault occured, first lookup the page in the CACHE, if it is not
>>     yet received, request to the source QEMU.
>>   - Once the remaining dirty pages are transfered, the source QEMU can go now.
>>
>> The existing postcopy mechanism does not need to be changed, just add the
>> remaining page transfer mechanism, and the RAM cache.
>>
>> I don't know if it is feasible and whether it will bring improvement to the
>> postcopy, what do you think?
>>
>>> +
>>> +Postcopy can be combined with precopy (i.e. normal migration) so that if
>>> precopy
>>> +doesn't finish in a given time the switch is made to postcopy.
>>> +
>>> +=== Enabling postcopy ===
>>> +
>>> +To enable postcopy (prior to the start of migration):
>>> +
>>> +migrate_set_capability x-postcopy-ram on
>>> +
>>> +The migration will still start in precopy mode, however issuing:
>>> +
>>> +migrate_start_postcopy
>>> +
>>> +will now cause the transition from precopy to postcopy.
>>> +It can be issued immediately after migration is started or any
>>> +time later on.  Issuing it after the end of a migration is harmless.
>>> +
>>> +=== Postcopy device transfer ===
>>> +
>>> +Loading of device data may cause the device emulation to access guest RAM
>>> +that may trigger faults that have to be resolved by the source, as such
>>> +the migration stream has to be able to respond with page data *during* the
>>> +device load, and hence the device data has to be read from the stream
>>> completely
>>> +before the device load begins to free the stream up.  This is achieved by
>>> +'packaging' the device data into a blob that's read in one go.
>>> +
>>> +Source behaviour
>>> +
>>> +Until postcopy is entered the migration stream is identical to normal
>>> +precopy, except for the addition of a 'postcopy advise' command at
>>> +the beginning, to tell the destination that postcopy might happen.
>>> +When postcopy starts the source sends the page discard data and then
>>> +forms the 'package' containing:
>>> +
>>> +   Command: 'postcopy listen'
>>> +   The device state
>>> +      A series of sections, identical to the precopy streams device state
>>> stream
>>> +      containing everything except postcopiable devices (i.e. RAM)
>>> +   Command: 'postcopy run'
>>> +
>>> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
>>> +contents are formatted in the same way as the main migration stream.
>>> +
>>> +Destination behaviour
>>> +
>>> +Initially the destination looks the same as precopy, with a single thread
>>> +reading the migration stream; the 'postcopy advise' and 'discard' commands
>>> +are processed to change the way RAM is managed, but don't affect the stream
>>> +processing.
>>> +
>>> +------------------------------------------------------------------------------
>>> +                        1      2   3     4 5                      6   7
>>> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
>>> +thread                             |       |
>>> +                                   |     (page request)
>>> +                                   |        \___
>>> +                                   v            \
>>> +listen thread:                     --- page -- page -- page -- page -- page --
>>> +
>>> +                                   a   b        c
>>> +------------------------------------------------------------------------------
>>> +
>>> +On receipt of CMD_PACKAGED (1)
>>> +   All the data associated with the package - the ( ... ) section in the
>>> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
>>> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
>>> +which contains commands (3,6) and devices (4...)
>>> +
>>> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
>>> +a new thread (a) is started that takes over servicing the migration stream,
>>> +while the main thread carries on loading the package.   It loads normal
>>> +background page data (b) but if during a device load a fault happens (5) the
>>> +returned page (c) is loaded by the listen thread allowing the main threads
>>> +device load to carry on.
>>> +
>>> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the
>>> destination
>>> +CPUs start running.
>>> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running
>>> behaviour
>>> +and is no longer used by migration, while the listen thread carries
>>> +on servicing page data until the end of migration.
>>> +
>>> +=== Postcopy states ===
>>> +
>>> +Postcopy moves through a series of states (see postcopy_state) from
>>> +ADVISE->LISTEN->RUNNING->END
>>> +
>>> +  Advise: Set at the start of migration if postcopy is enabled, even
>>> +          if it hasn't had the start command; here the destination
>>> +          checks that its OS has the support needed for postcopy, and performs
>>> +          setup to ensure the RAM mappings are suitable for later postcopy.
>>> +          (Triggered by reception of POSTCOPY_ADVISE command)
>>> +
>>> +  Listen: The first command in the package, POSTCOPY_LISTEN, switches
>>> +          the destination state to Listen, and starts a new thread
>>> +          (the 'listen thread') which takes over the job of receiving
>>> +          pages off the migration stream, while the main thread carries
>>> +          on processing the blob.  With this thread able to process page
>>> +          reception, the destination now 'sensitises' the RAM to detect
>>> +          any access to missing pages (on Linux using the 'userfault'
>>> +          system).
>>> +
>>> +  Running: POSTCOPY_RUN causes the destination to synchronise all
>>> +          state and start the CPUs and IO devices running.  The main
>>> +          thread now finishes processing the migration package and
>>> +          now carries on as it would for normal precopy migration
>>> +          (although it can't do the cleanup it would do as it
>>> +          finishes a normal migration).
>>> +
>>> +  End: The listen thread can now quit, and perform the cleanup of migration
>>> +          state, the migration is now complete.
>>> +
>>> +=== Source side page maps ===
>>> +
>>> +The source side keeps two bitmaps during postcopy; 'the migration bitmap'
>>> +and 'sent map'.  The 'migration bitmap' is basically the same as in
>>> +the precopy case, and holds a bit to indicate that page is 'dirty' -
>>> +i.e. needs sending.  During the precopy phase this is updated as the CPU
>>> +dirties pages, however during postcopy the CPUs are stopped and nothing
>>> +should dirty anything any more.
>>> +
>>> +The 'sent map' is used for the transition to postcopy. It is a bitmap that
>>> +has a bit set whenever a page is sent to the destination, however during
>>> +the transition to postcopy mode it is masked against the migration bitmap
>>> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that
>>> +have been previously been sent but are now dirty again.  This masked
>>> +sentmap is sent to the destination which discards those now dirty pages
>>> +before starting the CPUs.
>>> +
>>> +Note that the contents of the sentmap are sacrificed during the calculation
>>> +of the discard set and thus aren't valid once in postcopy.  The dirtymap
>>> +is still valid and is used to ensure that no page is sent more than once.  Any
>>> +request for a page that has already been sent is ignored.  Duplicate requests
>>> +such as this can happen as a page is sent at about the same time the
>>> +destination accesses it.
>>>
>>
>
>
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-26  8:00       ` Yang Hongyang
@ 2015-06-26  8:10         ` Dr. David Alan Gilbert
  2015-06-26  8:19           ` Yang Hongyang
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-26  8:10 UTC (permalink / raw)
  To: Yang Hongyang
  Cc: aarcange, yamahata, zhanghailiang, quintela, liang.z.li,
	peter.huangpeng, qemu-devel, luis, amit.shah, pbonzini, david

* Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
> 
> 
> On 06/26/2015 03:53 PM, zhanghailiang wrote:
> >On 2015/6/26 14:46, Yang Hongyang wrote:
> >>Hi Dave,
> >>
> >>On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote:
> >>>From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >>>
> >>[...]
> >>>+= Postcopy =
> >>>+'Postcopy' migration is a way to deal with migrations that refuse to converge;
> >>>+its plus side is that there is an upper bound on the amount of migration
> >>>traffic
> >>>+and time it takes, the down side is that during the postcopy phase, a
> >>>failure of
> >>>+*either* side or the network connection causes the guest to be lost.
> >>>+
> >>>+In postcopy the destination CPUs are started before all the memory has been
> >>>+transferred, and accesses to pages that are yet to be transferred cause
> >>>+a fault that's translated by QEMU into a request to the source QEMU.
> >>
> >>I have a immature idea,
> >>Can we keep a source RAM cache on destination QEMU, instead of request to the
> >>source QEMU, that is:
> >>  - When start_postcopy issued, source will paused, and __open another socket
> >>    (maybe another migration thread)__ to send the remaining dirty pages to
> >>    destination, at the same time, destination will start, and cache the
> >>    remaining pages.
> >
> >Er, it seems that current implementation is just like what you described except
> >the ram cache:
> >After switch to post-copy mode, the source side will send the remaining dirty
> >pages as pre-copy.
> >Here it does not need any cache at all, it just places the dirty pages where it
> >will be accessed.

Yes, zhanghailiang is correct; the source keeps sending other pages without being asked,
however when asked it sends requested pages immediately.  and the 'cache' is just
the main memory from which the destination is working.

However, the idea of using a separate socket is one that we have been thinking
about; one of the problems is that the urgent requested pages get delayed behind
the background page transfer and that increases the latency; a separate socket
should fix that.

> I haven't look into the implementation in detail, but if it is, I think it
> should be documented here...or in the below section [Source behaviour]

Yes, I can add to the documentation; I've added the following text:
  
  During postcopy the source scans the list of dirty pages and sends them
  to the destination without being requested (in much the same way as precopy),
  however when a page request is received from the destination the dirty page
  scanning restarts from the requested location.  This causes requested pages
  to be sent quickly, and also causes pages directly after the requested page
  to be sent quickly in the hope that those pages are likely to be requested
  by the destination soon.
  
Dave

> >
> >>  - When the page fault occured, first lookup the page in the CACHE, if it is not
> >>    yet received, request to the source QEMU.
> >>  - Once the remaining dirty pages are transfered, the source QEMU can go now.
> >>
> >>The existing postcopy mechanism does not need to be changed, just add the
> >>remaining page transfer mechanism, and the RAM cache.
> >>
> >>I don't know if it is feasible and whether it will bring improvement to the
> >>postcopy, what do you think?
> >>
> >>>+
> >>>+Postcopy can be combined with precopy (i.e. normal migration) so that if
> >>>precopy
> >>>+doesn't finish in a given time the switch is made to postcopy.
> >>>+
> >>>+=== Enabling postcopy ===
> >>>+
> >>>+To enable postcopy (prior to the start of migration):
> >>>+
> >>>+migrate_set_capability x-postcopy-ram on
> >>>+
> >>>+The migration will still start in precopy mode, however issuing:
> >>>+
> >>>+migrate_start_postcopy
> >>>+
> >>>+will now cause the transition from precopy to postcopy.
> >>>+It can be issued immediately after migration is started or any
> >>>+time later on.  Issuing it after the end of a migration is harmless.
> >>>+
> >>>+=== Postcopy device transfer ===
> >>>+
> >>>+Loading of device data may cause the device emulation to access guest RAM
> >>>+that may trigger faults that have to be resolved by the source, as such
> >>>+the migration stream has to be able to respond with page data *during* the
> >>>+device load, and hence the device data has to be read from the stream
> >>>completely
> >>>+before the device load begins to free the stream up.  This is achieved by
> >>>+'packaging' the device data into a blob that's read in one go.
> >>>+
> >>>+Source behaviour
> >>>+
> >>>+Until postcopy is entered the migration stream is identical to normal
> >>>+precopy, except for the addition of a 'postcopy advise' command at
> >>>+the beginning, to tell the destination that postcopy might happen.
> >>>+When postcopy starts the source sends the page discard data and then
> >>>+forms the 'package' containing:
> >>>+
> >>>+   Command: 'postcopy listen'
> >>>+   The device state
> >>>+      A series of sections, identical to the precopy streams device state
> >>>stream
> >>>+      containing everything except postcopiable devices (i.e. RAM)
> >>>+   Command: 'postcopy run'
> >>>+
> >>>+The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
> >>>+contents are formatted in the same way as the main migration stream.
> >>>+
> >>>+Destination behaviour
> >>>+
> >>>+Initially the destination looks the same as precopy, with a single thread
> >>>+reading the migration stream; the 'postcopy advise' and 'discard' commands
> >>>+are processed to change the way RAM is managed, but don't affect the stream
> >>>+processing.
> >>>+
> >>>+------------------------------------------------------------------------------
> >>>+                        1      2   3     4 5                      6   7
> >>>+main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
> >>>+thread                             |       |
> >>>+                                   |     (page request)
> >>>+                                   |        \___
> >>>+                                   v            \
> >>>+listen thread:                     --- page -- page -- page -- page -- page --
> >>>+
> >>>+                                   a   b        c
> >>>+------------------------------------------------------------------------------
> >>>+
> >>>+On receipt of CMD_PACKAGED (1)
> >>>+   All the data associated with the package - the ( ... ) section in the
> >>>+diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
> >>>+recurses into qemu_loadvm_state_main to process the contents of the package (2)
> >>>+which contains commands (3,6) and devices (4...)
> >>>+
> >>>+On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
> >>>+a new thread (a) is started that takes over servicing the migration stream,
> >>>+while the main thread carries on loading the package.   It loads normal
> >>>+background page data (b) but if during a device load a fault happens (5) the
> >>>+returned page (c) is loaded by the listen thread allowing the main threads
> >>>+device load to carry on.
> >>>+
> >>>+The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the
> >>>destination
> >>>+CPUs start running.
> >>>+At the end of the CMD_PACKAGED (7) the main thread returns to normal running
> >>>behaviour
> >>>+and is no longer used by migration, while the listen thread carries
> >>>+on servicing page data until the end of migration.
> >>>+
> >>>+=== Postcopy states ===
> >>>+
> >>>+Postcopy moves through a series of states (see postcopy_state) from
> >>>+ADVISE->LISTEN->RUNNING->END
> >>>+
> >>>+  Advise: Set at the start of migration if postcopy is enabled, even
> >>>+          if it hasn't had the start command; here the destination
> >>>+          checks that its OS has the support needed for postcopy, and performs
> >>>+          setup to ensure the RAM mappings are suitable for later postcopy.
> >>>+          (Triggered by reception of POSTCOPY_ADVISE command)
> >>>+
> >>>+  Listen: The first command in the package, POSTCOPY_LISTEN, switches
> >>>+          the destination state to Listen, and starts a new thread
> >>>+          (the 'listen thread') which takes over the job of receiving
> >>>+          pages off the migration stream, while the main thread carries
> >>>+          on processing the blob.  With this thread able to process page
> >>>+          reception, the destination now 'sensitises' the RAM to detect
> >>>+          any access to missing pages (on Linux using the 'userfault'
> >>>+          system).
> >>>+
> >>>+  Running: POSTCOPY_RUN causes the destination to synchronise all
> >>>+          state and start the CPUs and IO devices running.  The main
> >>>+          thread now finishes processing the migration package and
> >>>+          now carries on as it would for normal precopy migration
> >>>+          (although it can't do the cleanup it would do as it
> >>>+          finishes a normal migration).
> >>>+
> >>>+  End: The listen thread can now quit, and perform the cleanup of migration
> >>>+          state, the migration is now complete.
> >>>+
> >>>+=== Source side page maps ===
> >>>+
> >>>+The source side keeps two bitmaps during postcopy; 'the migration bitmap'
> >>>+and 'sent map'.  The 'migration bitmap' is basically the same as in
> >>>+the precopy case, and holds a bit to indicate that page is 'dirty' -
> >>>+i.e. needs sending.  During the precopy phase this is updated as the CPU
> >>>+dirties pages, however during postcopy the CPUs are stopped and nothing
> >>>+should dirty anything any more.
> >>>+
> >>>+The 'sent map' is used for the transition to postcopy. It is a bitmap that
> >>>+has a bit set whenever a page is sent to the destination, however during
> >>>+the transition to postcopy mode it is masked against the migration bitmap
> >>>+(sentmap &= migrationbitmap) to generate a bitmap recording pages that
> >>>+have been previously been sent but are now dirty again.  This masked
> >>>+sentmap is sent to the destination which discards those now dirty pages
> >>>+before starting the CPUs.
> >>>+
> >>>+Note that the contents of the sentmap are sacrificed during the calculation
> >>>+of the discard set and thus aren't valid once in postcopy.  The dirtymap
> >>>+is still valid and is used to ensure that no page is sent more than once.  Any
> >>>+request for a page that has already been sent is ignored.  Duplicate requests
> >>>+such as this can happen as a page is sent at about the same time the
> >>>+destination accesses it.
> >>>
> >>
> >
> >
> >.
> >
> 
> -- 
> Thanks,
> Yang.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-26  8:10         ` Dr. David Alan Gilbert
@ 2015-06-26  8:19           ` Yang Hongyang
  0 siblings, 0 replies; 209+ messages in thread
From: Yang Hongyang @ 2015-06-26  8:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, zhanghailiang, quintela, liang.z.li,
	peter.huangpeng, qemu-devel, luis, amit.shah, pbonzini, david



On 06/26/2015 04:10 PM, Dr. David Alan Gilbert wrote:
> * Yang Hongyang (yanghy@cn.fujitsu.com) wrote:
>>
>>
>> On 06/26/2015 03:53 PM, zhanghailiang wrote:
>>> On 2015/6/26 14:46, Yang Hongyang wrote:
>>>> Hi Dave,
>>>>
>>>> On 06/16/2015 06:26 PM, Dr. David Alan Gilbert (git) wrote:
>>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>>
>>>> [...]
>>>>> += Postcopy =
>>>>> +'Postcopy' migration is a way to deal with migrations that refuse to converge;
>>>>> +its plus side is that there is an upper bound on the amount of migration
>>>>> traffic
>>>>> +and time it takes, the down side is that during the postcopy phase, a
>>>>> failure of
>>>>> +*either* side or the network connection causes the guest to be lost.
>>>>> +
>>>>> +In postcopy the destination CPUs are started before all the memory has been
>>>>> +transferred, and accesses to pages that are yet to be transferred cause
>>>>> +a fault that's translated by QEMU into a request to the source QEMU.
>>>>
>>>> I have a immature idea,
>>>> Can we keep a source RAM cache on destination QEMU, instead of request to the
>>>> source QEMU, that is:
>>>>   - When start_postcopy issued, source will paused, and __open another socket
>>>>     (maybe another migration thread)__ to send the remaining dirty pages to
>>>>     destination, at the same time, destination will start, and cache the
>>>>     remaining pages.
>>>
>>> Er, it seems that current implementation is just like what you described except
>>> the ram cache:
>>> After switch to post-copy mode, the source side will send the remaining dirty
>>> pages as pre-copy.
>>> Here it does not need any cache at all, it just places the dirty pages where it
>>> will be accessed.
>
> Yes, zhanghailiang is correct; the source keeps sending other pages without being asked,
> however when asked it sends requested pages immediately.  and the 'cache' is just
> the main memory from which the destination is working.
>
> However, the idea of using a separate socket is one that we have been thinking
> about; one of the problems is that the urgent requested pages get delayed behind
> the background page transfer and that increases the latency; a separate socket
> should fix that.

That would be better.

>
>> I haven't look into the implementation in detail, but if it is, I think it
>> should be documented here...or in the below section [Source behaviour]
>
> Yes, I can add to the documentation; I've added the following text:
>
>    During postcopy the source scans the list of dirty pages and sends them
>    to the destination without being requested (in much the same way as precopy),
>    however when a page request is received from the destination the dirty page
>    scanning restarts from the requested location.  This causes requested pages
>    to be sent quickly, and also causes pages directly after the requested page
>    to be sent quickly in the hope that those pages are likely to be requested
>    by the destination soon.

Looks clearer for me now :)

>
> Dave
>
>>>
>>>>   - When the page fault occured, first lookup the page in the CACHE, if it is not
>>>>     yet received, request to the source QEMU.
>>>>   - Once the remaining dirty pages are transfered, the source QEMU can go now.
>>>>
>>>> The existing postcopy mechanism does not need to be changed, just add the
>>>> remaining page transfer mechanism, and the RAM cache.
>>>>
>>>> I don't know if it is feasible and whether it will bring improvement to the
>>>> postcopy, what do you think?
>>>>
>>>>> +
>>>>> +Postcopy can be combined with precopy (i.e. normal migration) so that if
>>>>> precopy
>>>>> +doesn't finish in a given time the switch is made to postcopy.
>>>>> +
>>>>> +=== Enabling postcopy ===
>>>>> +
>>>>> +To enable postcopy (prior to the start of migration):
>>>>> +
>>>>> +migrate_set_capability x-postcopy-ram on
>>>>> +
>>>>> +The migration will still start in precopy mode, however issuing:
>>>>> +
>>>>> +migrate_start_postcopy
>>>>> +
>>>>> +will now cause the transition from precopy to postcopy.
>>>>> +It can be issued immediately after migration is started or any
>>>>> +time later on.  Issuing it after the end of a migration is harmless.
>>>>> +
>>>>> +=== Postcopy device transfer ===
>>>>> +
>>>>> +Loading of device data may cause the device emulation to access guest RAM
>>>>> +that may trigger faults that have to be resolved by the source, as such
>>>>> +the migration stream has to be able to respond with page data *during* the
>>>>> +device load, and hence the device data has to be read from the stream
>>>>> completely
>>>>> +before the device load begins to free the stream up.  This is achieved by
>>>>> +'packaging' the device data into a blob that's read in one go.
>>>>> +
>>>>> +Source behaviour
>>>>> +
>>>>> +Until postcopy is entered the migration stream is identical to normal
>>>>> +precopy, except for the addition of a 'postcopy advise' command at
>>>>> +the beginning, to tell the destination that postcopy might happen.
>>>>> +When postcopy starts the source sends the page discard data and then
>>>>> +forms the 'package' containing:
>>>>> +
>>>>> +   Command: 'postcopy listen'
>>>>> +   The device state
>>>>> +      A series of sections, identical to the precopy streams device state
>>>>> stream
>>>>> +      containing everything except postcopiable devices (i.e. RAM)
>>>>> +   Command: 'postcopy run'
>>>>> +
>>>>> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
>>>>> +contents are formatted in the same way as the main migration stream.
>>>>> +
>>>>> +Destination behaviour
>>>>> +
>>>>> +Initially the destination looks the same as precopy, with a single thread
>>>>> +reading the migration stream; the 'postcopy advise' and 'discard' commands
>>>>> +are processed to change the way RAM is managed, but don't affect the stream
>>>>> +processing.
>>>>> +
>>>>> +------------------------------------------------------------------------------
>>>>> +                        1      2   3     4 5                      6   7
>>>>> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
>>>>> +thread                             |       |
>>>>> +                                   |     (page request)
>>>>> +                                   |        \___
>>>>> +                                   v            \
>>>>> +listen thread:                     --- page -- page -- page -- page -- page --
>>>>> +
>>>>> +                                   a   b        c
>>>>> +------------------------------------------------------------------------------
>>>>> +
>>>>> +On receipt of CMD_PACKAGED (1)
>>>>> +   All the data associated with the package - the ( ... ) section in the
>>>>> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
>>>>> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
>>>>> +which contains commands (3,6) and devices (4...)
>>>>> +
>>>>> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
>>>>> +a new thread (a) is started that takes over servicing the migration stream,
>>>>> +while the main thread carries on loading the package.   It loads normal
>>>>> +background page data (b) but if during a device load a fault happens (5) the
>>>>> +returned page (c) is loaded by the listen thread allowing the main threads
>>>>> +device load to carry on.
>>>>> +
>>>>> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the
>>>>> destination
>>>>> +CPUs start running.
>>>>> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running
>>>>> behaviour
>>>>> +and is no longer used by migration, while the listen thread carries
>>>>> +on servicing page data until the end of migration.
>>>>> +
>>>>> +=== Postcopy states ===
>>>>> +
>>>>> +Postcopy moves through a series of states (see postcopy_state) from
>>>>> +ADVISE->LISTEN->RUNNING->END
>>>>> +
>>>>> +  Advise: Set at the start of migration if postcopy is enabled, even
>>>>> +          if it hasn't had the start command; here the destination
>>>>> +          checks that its OS has the support needed for postcopy, and performs
>>>>> +          setup to ensure the RAM mappings are suitable for later postcopy.
>>>>> +          (Triggered by reception of POSTCOPY_ADVISE command)
>>>>> +
>>>>> +  Listen: The first command in the package, POSTCOPY_LISTEN, switches
>>>>> +          the destination state to Listen, and starts a new thread
>>>>> +          (the 'listen thread') which takes over the job of receiving
>>>>> +          pages off the migration stream, while the main thread carries
>>>>> +          on processing the blob.  With this thread able to process page
>>>>> +          reception, the destination now 'sensitises' the RAM to detect
>>>>> +          any access to missing pages (on Linux using the 'userfault'
>>>>> +          system).
>>>>> +
>>>>> +  Running: POSTCOPY_RUN causes the destination to synchronise all
>>>>> +          state and start the CPUs and IO devices running.  The main
>>>>> +          thread now finishes processing the migration package and
>>>>> +          now carries on as it would for normal precopy migration
>>>>> +          (although it can't do the cleanup it would do as it
>>>>> +          finishes a normal migration).
>>>>> +
>>>>> +  End: The listen thread can now quit, and perform the cleanup of migration
>>>>> +          state, the migration is now complete.
>>>>> +
>>>>> +=== Source side page maps ===
>>>>> +
>>>>> +The source side keeps two bitmaps during postcopy; 'the migration bitmap'
>>>>> +and 'sent map'.  The 'migration bitmap' is basically the same as in
>>>>> +the precopy case, and holds a bit to indicate that page is 'dirty' -
>>>>> +i.e. needs sending.  During the precopy phase this is updated as the CPU
>>>>> +dirties pages, however during postcopy the CPUs are stopped and nothing
>>>>> +should dirty anything any more.
>>>>> +
>>>>> +The 'sent map' is used for the transition to postcopy. It is a bitmap that
>>>>> +has a bit set whenever a page is sent to the destination, however during
>>>>> +the transition to postcopy mode it is masked against the migration bitmap
>>>>> +(sentmap &= migrationbitmap) to generate a bitmap recording pages that
>>>>> +have been previously been sent but are now dirty again.  This masked
>>>>> +sentmap is sent to the destination which discards those now dirty pages
>>>>> +before starting the CPUs.
>>>>> +
>>>>> +Note that the contents of the sentmap are sacrificed during the calculation
>>>>> +of the discard set and thus aren't valid once in postcopy.  The dirtymap
>>>>> +is still valid and is used to ensure that no page is sent more than once.  Any
>>>>> +request for a page that has already been sent is ignored.  Duplicate requests
>>>>> +such as this can happen as a page is sent at about the same time the
>>>>> +destination accesses it.
>>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>> --
>> Thanks,
>> Yang.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
>

-- 
Thanks,
Yang.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source
  2015-06-19 18:42     ` Dr. David Alan Gilbert
@ 2015-07-01  9:29       ` Juan Quintela
  2015-08-06 12:18         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-01  9:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >
>> > Add migrate_send_rp_message to send a message from destination to source along the return path.
>> >   (It uses a mutex to let it be called from multiple threads)
>> > Add migrate_send_rp_shut to send a 'shut' message to indicate
>> >   the destination is finished with the RP.
>> > Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
>> >   Use it in the MSG_RP_PING handler
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> > ---
>> >  include/migration/migration.h | 17 ++++++++++++++++
>> >  migration/migration.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
>> >  migration/savevm.c            |  2 +-
>> >  trace-events                  |  1 +
>> >  4 files changed, 64 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/include/migration/migration.h b/include/migration/migration.h
>> > index 65fe5db..36caab9 100644
>> > --- a/include/migration/migration.h
>> > +++ b/include/migration/migration.h
>> > @@ -42,12 +42,20 @@ struct MigrationParams {
>> >      bool shared;
>> >  };
>> >  
>> > +/* Messages sent on the return path from destination to source */
>> > +enum mig_rp_message_type {
>> > +    MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
>> > +    MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
>> > +    MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
>> > +};
>> > +
>> >  typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
>> >  /* State for the incoming migration */
>> >  struct MigrationIncomingState {
>> >      QEMUFile *file;
>> >  
>> >      QEMUFile *return_path;
>> > +    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
>> >  
>> >      /* See savevm.c */
>> >      LoadStateEntry_Head loadvm_handlers;
>> > @@ -179,6 +187,15 @@ int migrate_compress_level(void);
>> >  int migrate_compress_threads(void);
>> >  int migrate_decompress_threads(void);
>> >  
>> > +/* Sending on the return path - generic and then for each message type */
>> > +void migrate_send_rp_message(MigrationIncomingState *mis,
>> > +                             enum mig_rp_message_type message_type,
>> > +                             uint16_t len, void *data);
>> > +void migrate_send_rp_shut(MigrationIncomingState *mis,
>> > +                          uint32_t value);
>> > +void migrate_send_rp_pong(MigrationIncomingState *mis,
>> > +                          uint32_t value);
>> > +
>> >  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>> >  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
>> >  void ram_control_load_hook(QEMUFile *f, uint64_t flags);
>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index 295f15a..afb19a1 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -85,6 +85,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
>> >      mis_current = g_malloc0(sizeof(MigrationIncomingState));
>> >      mis_current->file = f;
>> >      QLIST_INIT(&mis_current->loadvm_handlers);
>> > +    qemu_mutex_init(&mis_current->rp_mutex);
>> >  
>> >      return mis_current;
>> >  }
>> > @@ -182,6 +183,50 @@ void process_incoming_migration(QEMUFile *f)
>> >      qemu_coroutine_enter(co, f);
>> >  }
>> >  
>> > +/*
>> > + * Send a message on the return channel back to the source
>> > + * of the migration.
>> > + */
>> > +void migrate_send_rp_message(MigrationIncomingState *mis,
>> > +                             enum mig_rp_message_type message_type,
>> > +                             uint16_t len, void *data)
>> > +{
>> > +    trace_migrate_send_rp_message((int)message_type, len);
>> > +    qemu_mutex_lock(&mis->rp_mutex);
>> > +    qemu_put_be16(mis->return_path, (unsigned int)message_type);
>> > +    qemu_put_be16(mis->return_path, len);
>> if (len) {
>> 
>> > +    qemu_put_buffer(mis->return_path, data, len);
>> }
>> 
>> 
>> ?
>> 
>> We check for zero sized command on control commands but not on
>> responses?
>
> Or should I remove the check in the control commands case?
> qemu_put_buffer looks like it's safe for size == 0

I would go for this just for consistence?

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest Dr. David Alan Gilbert (git)
  2015-06-17 11:49   ` Juan Quintela
@ 2015-07-06  6:14   ` Amit Shah
  2015-08-04  5:23   ` Amit Shah
  2 siblings, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-06  6:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:16], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> One of my patches used a loop that was based on host page size;
> it dies in qtest since qtest hadn't bothered init'ing it.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 04/42] qemu_ram_block_from_host
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 04/42] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
  2015-06-17 11:54   ` Juan Quintela
@ 2015-07-10  8:36   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-10  8:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:17], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy sends RAMBlock names and offsets over the wire (since it can't
> rely on the order of ramaddr being the same), and it starts out with
> HVA fault addresses from the kernel.
> 
> qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
> in the RAMBlock and the global ram_addr_t value.
> 
> Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.
> 
> Provide qemu_ram_get_idstr since its the actual name text sent on the
> wire.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
  2015-06-17 11:57   ` Juan Quintela
@ 2015-07-13  9:08   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-13  9:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:18], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> qemu_get_buffer always copies the data it reads to a users buffer,
> however in many cases the file buffer inside qemu_file could be given
> back to the caller, avoiding the copy.  This isn't always possible
> depending on the size and alignment of the data.
> 
> Thus 'qemu_get_buffer_less_copy' either copies the data to a supplied
> buffer or updates a pointer to the internal buffer if convenient.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
  2015-06-17 12:17   ` Juan Quintela
@ 2015-07-13  9:12   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-13  9:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:20], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Useful for debugging the migration bitmap and other bitmaps
> of the same format (including the sentmap in postcopy).
> 
> The bitmap is printed to stderr.
> Lines that are all the expected value are excluded so the output
> can be quite compact for many bitmaps.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/42] migrate_init: Call from savevm
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 08/42] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
  2015-06-17 12:18   ` Juan Quintela
@ 2015-07-13  9:13   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-13  9:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:21], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Suspend to file is very much like a migrate, and it makes life
> easier if we have the Migration state available, so initialise it
> in the savevm.c code for suspending.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewd-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
  2015-06-17 12:23   ` Juan Quintela
@ 2015-07-13 10:12   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-13 10:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:23], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy needs a method to send messages from the destination back to
> the source, this is the 'return path'.
> 
> Wire it up for 'socket' QEMUFile's.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

Thanks, this looks better than the dup way of doing it.

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text
  2015-06-19 17:04     ` Dr. David Alan Gilbert
@ 2015-07-13 10:15       ` Juan Quintela
  0 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 10:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >
>> > Useful for debugging the migration bitmap and other bitmaps
>> > of the same format (including the sentmap in postcopy).
>> >
>> > The bitmap is printed to stderr.
>> > Lines that are all the expected value are excluded so the output
>> > can be quite compact for many bitmaps.
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> > ---
>> >  include/migration/migration.h |  1 +
>> >  migration/ram.c               | 38 ++++++++++++++++++++++++++++++++++++++
>> >  2 files changed, 39 insertions(+)
>> >
>> > diff --git a/include/migration/migration.h b/include/migration/migration.h
>> > index 9387c8c..b3a7f75 100644
>> > --- a/include/migration/migration.h
>> > +++ b/include/migration/migration.h
>> > @@ -144,6 +144,7 @@ uint64_t xbzrle_mig_pages_cache_miss(void);
>> >  double xbzrle_mig_cache_miss_rate(void);
>> >  
>> >  void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
>> > +void ram_debug_dump_bitmap(unsigned long *todump, bool expected);
>> >  
>> >  /**
>> >   * @migrate_add_blocker - prevent migration from proceeding
>> > diff --git a/migration/ram.c b/migration/ram.c
>> > index 57368e1..efc215a 100644
>> > --- a/migration/ram.c
>> > +++ b/migration/ram.c
>> > @@ -1051,6 +1051,44 @@ static void reset_ram_globals(void)
>> >  
>> >  #define MAX_WAIT 50 /* ms, half buffered_file limit */
>> >  
>> > +/*
>> > + * 'expected' is the value you expect the bitmap mostly to be full
>> > + * of; it won't bother printing lines that are all this value.
>> > + * If 'todump' is null the migration bitmap is dumped.
>> > + */
>> > +void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
>> > +{
>> > +    int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>> > +
>> > +    int64_t cur;
>> > +    int64_t linelen = 128;
>> > +    char linebuf[129];
>> > +
>> > +    if (!todump) {
>> > +        todump = migration_bitmap;
>> > +    }
>> 
>> Why?  Just alssert that todump!= NULL?
>
> 'migration_bitmap' is static to ram.c, so allowing NULL to get
> you a dump of the migration_bitmap means that if you call this
> dump routine from any error path you're debugging anywhere in qemu
> then you can dump the migration bitmap.  e.g. I was adding calls
> to this in migration.c and migration/postcopy-ram.c in error
> paths I was trying to debug.

ok.

>
>> > +    for (cur = 0; cur < ram_pages; cur += linelen) {
>> > +        int64_t curb;
>> > +        bool found = false;
>> > +        /*
>> > +         * Last line; catch the case where the line length
>> > +         * is longer than remaining ram
>> > +         */
>> > +        if (cur + linelen > ram_pages) {
>> > +            linelen = ram_pages - cur;
>> > +        }
>> > +        for (curb = 0; curb < linelen; curb++) {
>> > +            bool thisbit = test_bit(cur + curb, todump);
>> > +            linebuf[curb] = thisbit ? '1' : '.';
>> 
>> Put 1 and 0?  Why the dot?
>
> It's easier to see an occasional '1' in a big field of .'s.

ok.

>
>> > +            found = found || (thisbit != expected);
>> > +        }
>> > +        if (found) {
>> > +            linebuf[curb] = '\0';
>> > +            fprintf(stderr,  "0x%08" PRIx64 " : %s\n", cur, linebuf);
>> > +        }
>> > +    }
>> > +}
>> 
>> 
>> And once here, why are we doing it this way?  We have
>> 
>> find_first_bit(addr, nbits) and find_first_zero_bit(addr, nbits) and
>> friends?
>> 
>> Doiwg the walk by hand looks weird, no?
>
> Here's a compile-tested-only version using find_  - it's bigger, if you think
> it's better I can use this instead:
>
> void ram_debug_dump_bitmap(unsigned long *todump, bool expected)
> {
>     int64_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
>
>     int64_t cur;
>     int64_t linelen = 128;
>
>     if (!todump) {
>         todump = migration_bitmap;
>     }
>
>     for (cur = 0; cur < ram_pages; cur += linelen) {
>         int64_t curb;
>         unsigned long next_bit;
>
>         /*
>          * Last line; catch the case where the line length
>          * is longer than remaining ram
>          */
>         if (cur + linelen > ram_pages) {
>             linelen = ram_pages - cur;
>         }
>         if (expected) {
>             next_bit = find_next_bit(todump, cur + linelen, cur);
>         } else {
>             next_bit = find_next_zero_bit(todump, cur + linelen, cur);
>         }
>         if (next_bit >= (cur + linelen)) {
>             continue;
>         }
>
>         for (curb = 0; curb < linelen; curb++) {
>             bool thisbit = test_bit(cur + curb, todump);
>             fputc(thisbit ? '1' : '.', stderr);
>         }
>         fputc('\n', stderr);
>     }
> }
>
> Dave

Reviewed-by: Juan Quintela <quintela@redhat.com>


>
>
>> 
>> Later, Juan.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path Dr. David Alan Gilbert (git)
@ 2015-07-13 10:29   ` Juan Quintela
  2015-08-18 10:23     ` Dr. David Alan Gilbert
  2015-07-15  7:50   ` Amit Shah
  2015-08-05  8:06   ` zhanghailiang
  2 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 10:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Open a return path, and handle messages that are received upon it.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> +/*
> + * Handles messages sent on the return path towards the source VM
> + *
> + */
> +static void *source_return_path_thread(void *opaque)
> +{
> +    MigrationState *ms = opaque;
> +    QEMUFile *rp = ms->rp_state.file;
> +    uint16_t expected_len, header_len, header_type;
> +    const int max_len = 512;
> +    uint8_t buf[max_len];
> +    uint32_t tmp32;
> +    int res;
> +
> +    trace_source_return_path_thread_entry();
> +    while (rp && !qemu_file_get_error(rp) &&

What can make rp == NULL?
THinking about that, could you mean *rp here?


> +        migration_already_active(ms)) {
> +        trace_source_return_path_thread_loop_top();
> +        header_type = qemu_get_be16(rp);
> +        header_len = qemu_get_be16(rp);
> +
> +        switch (header_type) {
> +        case MIG_RP_MSG_SHUT:
> +        case MIG_RP_MSG_PONG:
> +            expected_len = 4;
> +            break;
> +
> +        default:
> +            error_report("RP: Received invalid message 0x%04x length 0x%04x",
> +                    header_type, header_len);
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
>  
> +        if (header_len > expected_len) {
> +            error_report("RP: Received message 0x%04x with"
> +                    "incorrect length %d expecting %d",
> +                    header_type, header_len,
> +                    expected_len);

I know this is a big request, but getting an array with messages length
and message names to be able to print nice error messages looks ilke good?

> +            source_return_path_bad(ms);
> +            goto out;
> +        }
> +
> +        /* We know we've got a valid header by this point */
> +        res = qemu_get_buffer(rp, buf, header_len);
> +        if (res != header_len) {
> +            trace_source_return_path_thread_failed_read_cmd_data();
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
> +
> +        /* OK, we have the message and the data */
> +        switch (header_type) {
> +        case MIG_RP_MSG_SHUT:
> +            tmp32 = be32_to_cpup((uint32_t *)buf);

make local variable and call it sibling_error or whatever you like?

> +            trace_source_return_path_thread_shut(tmp32);
> +            if (tmp32) {
> +                error_report("RP: Sibling indicated error %d", tmp32);
> +                source_return_path_bad(ms);
> +            }
> +            /*
> +             * We'll let the main thread deal with closing the RP
> +             * we could do a shutdown(2) on it, but we're the only user
> +             * anyway, so there's nothing gained.
> +             */
> +            goto out;
> +
> +        case MIG_RP_MSG_PONG:
> +            tmp32 = be32_to_cpup((uint32_t *)buf);

unused?
Althought I guess it is used somewhere to make sure that the value is
the same that whatever we did the ping.  credentials?

I can't see with this and previous patch what value is sent here.


> +            trace_source_return_path_thread_pong(tmp32);
> +            break;
> +
> +        default:
> +            break;
> +        }
> +    }
> +    if (rp && qemu_file_get_error(rp)) {
> +        trace_source_return_path_thread_bad_end();
> +        source_return_path_bad(ms);
> +    }
> +
> +    trace_source_return_path_thread_end();
> +out:
> +    return NULL;
> +}
> +
> +__attribute__ (( unused )) /* Until later in patch series */

unused_by_know attribute required O:-)

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 16/42] Rework loadvm path for subloops
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 16/42] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
@ 2015-07-13 10:33   ` Juan Quintela
  2015-07-15  9:34   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 10:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Postcopy needs to have two migration streams loading concurrently;
> one from memory (with the device state) and the other from the fd
> with the memory transactions.
>
> Split the core of qemu_loadvm_state out so we can use it for both.
>
> Allow the inner loadvm loop to quit and cause the parent loops to
> exit as well.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
  2015-06-16 15:43   ` Eric Blake
@ 2015-07-13 10:35   ` Juan Quintela
  2015-07-15  9:40   ` Amit Shah
  2 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 10:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The 'postcopy ram' capability allows postcopy migration of RAM;
> note that the migration starts off in precopy mode until
> postcopy mode is triggered (see the migrate_start_postcopy
> patch later in the series).
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
@ 2015-07-13 11:02   ` Juan Quintela
  2015-07-20 10:13     ` Amit Shah
  2015-08-26 14:48     ` Dr. David Alan Gilbert
  2015-07-20 10:06   ` Amit Shah
  1 sibling, 2 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 11:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The state of the postcopy process is managed via a series of messages;
>    * Add wrappers and handlers for sending/receiving these messages
>    * Add state variable that track the current state of postcopy
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  16 +++
>  include/sysemu/sysemu.h       |  20 ++++
>  migration/migration.c         |  13 +++
>  migration/savevm.c            | 247 ++++++++++++++++++++++++++++++++++++++++++
>  trace-events                  |  10 ++
>  5 files changed, 306 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index cd89a9b..34cd9a6 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1128,3 +1128,16 @@ void migrate_fd_connect(MigrationState *s)
>      qemu_thread_create(&s->thread, "migration", migration_thread, s,
>                         QEMU_THREAD_JOINABLE);
>  }
> +
> +PostcopyState  postcopy_state_get(MigrationIncomingState *mis)
> +{
> +    return atomic_fetch_add(&mis->postcopy_state, 0);

What is wrong with atomic_read() here?
As the set of the state is atomic, even a normal read would do (I think)

> +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> +                                           uint16_t len,
> +                                           uint64_t *start_list,
> +                                           uint64_t *end_list)

I haven't looked at the following patches where this function is used,
but it appears that getting an iovec could be a good idea?

> +{
> +    uint8_t *buf;
> +    uint16_t tmplen;
> +    uint16_t t;
> +    size_t name_len = strlen(name);
> +
> +    trace_qemu_savevm_send_postcopy_ram_discard(name, len);
> +    buf = g_malloc0(len*16 + name_len + 3);

I would suggest
       gmalloc0(1 + 1 + name_len + 1 + (8 + 8) * len)

       just to be clear where things came from.

       I think that we don't need the \0 at all.  If \0 is not there,
       strlen() return is going to be "funny".  So, we can just change
       the assert to name_len < 255?

> +    buf[0] = 0; /* Version */
> +    assert(name_len < 256);

Can we move the assert before the malloc()?

My guess is that in a perfect world the assert would be a return
-EINVAL, but I know that it is complicated.

> +    buf[1] = name_len;
> +    memcpy(buf+2, name, name_len);

spaces around '+' (same around)

> +    tmplen = 2+name_len;
> +    buf[tmplen++] = '\0';
> +
> +    for (t = 0; t < len; t++) {
> +        cpu_to_be64w((uint64_t *)(buf + tmplen), start_list[t]);
> +        tmplen += 8;
> +        cpu_to_be64w((uint64_t *)(buf + tmplen), end_list[t]);
> +        tmplen += 8;
           trace_qemu_savevm_send_postcopy_range(name, start_list[t], end_list[t]);

??


> +    /* We're expecting a
> +     *    Version (0)
> +     *    a RAM ID string (length byte, name, 0 term)
> +     *    then at least 1 16 byte chunk
> +    */
> +    if (len < 20) { 1 +

       1+1+1+1+2*8

Humm, thinking about it, .... why are we not needing a length field of
number of entries?

> +        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
> +        return -1;
> +    }
> +
> +    tmp = qemu_get_byte(mis->file);
> +    if (tmp != 0) {

I think that a constant telling POSTCOPY_VERSION0 or whatever?

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
@ 2015-07-13 11:07   ` Juan Quintela
  2015-07-21  6:11   ` Amit Shah
  2015-08-04  5:27   ` Amit Shah
  2 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 11:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
> stream inside a package whose length can be determined purely by reading
> its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
> is read off the stream prior to parsing the contents.
>
> This is used by postcopy to load device state (from the package)
> while leaving the main stream free to receive memory pages.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> +/* We have a buffer of data to send; we don't want that all to be loaded
> + * by the command itself, so the command contains just the length of the
> + * extra buffer that we then send straight after it.
> + * TODO: Must be a better way to organise that

Famous words of advise O:-)


I have to read the rest of series to make my mind on this one.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
@ 2015-07-13 11:12   ` Juan Quintela
  2015-07-31 16:13     ` Dr. David Alan Gilbert
  2015-07-21  6:17   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 11:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Modify save_live_pending to return separate postcopiable and
> non-postcopiable counts.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I think that if you make a small change of meaning, everything gots easier:

> -static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
> +static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> +                               uint64_t *non_postcopiable_pending,
> +                               uint64_t *postcopiable_pending)
>  {
>      /* Estimate pending number of bytes to send */
>      uint64_t pending;
> @@ -773,7 +775,8 @@ static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
>      qemu_mutex_unlock_iothread();
>  
>      DPRINTF("Enter save live pending  %" PRIu64 "\n", pending);
> -    return pending;
> +    *non_postcopiable_pending = pending;
> +    *postcopiable_pending = 0;

Change that two lines to:

       *non_postcopiable_pending += pending;
       *postcopiable_pending += 0; /* ok, equivalent of doing nothing */

This way, chaining gots easier?




> diff --git a/migration/savevm.c b/migration/savevm.c
> index 2c4cbe1..ebd3d31 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1012,10 +1012,20 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
>      qemu_fflush(f);
>  }
>  
> -uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
> +/* Give an estimate of the amount left to be transferred,
> + * the result is split into the amount for units that can and
> + * for units that can't do postcopy.
> + */
> +void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
> +                               uint64_t *res_non_postcopiable,
> +                               uint64_t *res_postcopiable)
>  {
>      SaveStateEntry *se;
> -    uint64_t ret = 0;
> +    uint64_t tmp_non_postcopiable, tmp_postcopiable;
> +
> +    *res_non_postcopiable = 0;
> +    *res_postcopiable = 0;
> +
>  
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>          if (!se->ops || !se->ops->save_live_pending) {
> @@ -1026,9 +1036,12 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
>                  continue;
>              }
>          }
> -        ret += se->ops->save_live_pending(f, se->opaque, max_size);
> +        se->ops->save_live_pending(f, se->opaque, max_size,
> +                                   &tmp_non_postcopiable, &tmp_postcopiable);
> +
> +        *res_postcopiable += tmp_postcopiable;
> +        *res_non_postcopiable += tmp_non_postcopiable;
>      }
> -    return ret;

With the change, we don't care in the other functions, and this one gets
simpler IMHO.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test Dr. David Alan Gilbert (git)
@ 2015-07-13 11:20   ` Juan Quintela
  2015-07-13 16:31     ` Dr. David Alan Gilbert
  2015-07-21  7:29   ` Amit Shah
  2015-08-04  5:28   ` Amit Shah
  2 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 11:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Provide a check to see if the OS we're running on has all the bits
> needed for postcopy.
>
> Creates postcopy-ram.c which will get most of the other helpers we need.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I am guessing that test is ok, but we are doing the test each time that
we change the function.  We always end calling that kind of functions in
several places.  Shouldn't be good to rename the function to
__postcopy_ram_supported_by_host()

and do a toplevel function that is:

bool postcopy_ram_supported_by_host(void)
{
        static bool first_time = true;
        static supported = false;

        if (firt_time) {
           first_time = false;
           supported = __postcopy_ram_supported_by_host()
        }
        return supported;
}

Notice that I don't know how slow the mmap + usefault thing is, but I
guess that the values would not change while running, no?

It has a review-by because I don't see anything wrong with it.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
@ 2015-07-13 11:23   ` Juan Quintela
  2015-07-13 17:13     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 11:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Once postcopy is enabled (with migrate_set_capability), the migration
> will still start on precopy mode.  To cause a transition into postcopy
> the:
>
>   migrate_start_postcopy
>
> command must be issued.  Postcopy will start sometime after this
> (when it's next checked in the migration loop).
>
> Issuing the command before migration has started will error,
> and issuing after it has finished is ignored.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index a5951ac..e973490 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -111,6 +111,9 @@ struct MigrationState
>      int64_t xbzrle_cache_size;
>      int64_t setup_time;
>      int64_t dirty_sync_count;
> +
> +    /* Flag set once the migration has been asked to enter postcopy */
> +    bool start_postcopy;
>  };
>  
>  void process_incoming_migration(QEMUFile *f);
> diff --git a/migration/migration.c b/migration/migration.c
> index e77b8b4..6fc47f9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -465,6 +465,28 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>      }
>  }
>  
> +void qmp_migrate_start_postcopy(Error **errp)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    if (!migrate_postcopy_ram()) {
> +        error_setg(errp, "Enable postcopy with migration_set_capability before"
> +                         " the start of migration");
> +        return;
> +    }
> +
> +    if (s->state == MIGRATION_STATUS_NONE) {

I would claim that this check should be:

    if (s->state != MIGRATION_STATUS_ACTIVE) {
??

FAILED, COMPLETED, CANCELL* don't make sense, right?

Thanks, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
@ 2015-07-13 11:27   ` Juan Quintela
  2015-07-13 15:53     ` Dr. David Alan Gilbert
  2015-07-21 10:33   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 11:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
>
> 'migration_postcopy_phase' is provided for other sections to know if
> they're in postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>


But (there is always a but....)


> @@ -358,6 +359,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>  
>          get_xbzrle_cache_stats(info);
>          break;
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +        /* Mostly the same as active; TODO add some postcopy stats */
> +        info->has_status = true;
> +        info->has_total_time = true;
> +        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
> +            - s->total_time;
> +        info->has_expected_downtime = true;
> +        info->expected_downtime = s->expected_downtime;
> +        info->has_setup_time = true;
> +        info->setup_time = s->setup_time;
> +
> +        info->has_ram = true;
> +        info->ram = g_malloc0(sizeof(*info->ram));
> +        info->ram->transferred = ram_bytes_transferred();
> +        info->ram->remaining = ram_bytes_remaining();
> +        info->ram->total = ram_bytes_total();
> +        info->ram->duplicate = dup_mig_pages_transferred();
> +        info->ram->skipped = skipped_mig_pages_transferred();
> +        info->ram->normal = norm_mig_pages_transferred();
> +        info->ram->normal_bytes = norm_mig_bytes_transferred();
> +        info->ram->dirty_pages_rate = s->dirty_pages_rate;
> +        info->ram->mbps = s->mbps;
> +
> +        if (blk_mig_active()) {
> +            info->has_disk = true;
> +            info->disk = g_malloc0(sizeof(*info->disk));
> +            info->disk->transferred = blk_mig_bytes_transferred();
> +            info->disk->remaining = blk_mig_bytes_remaining();
> +            info->disk->total = blk_mig_bytes_total();
> +        }

Can we have block migration active with postcopy?  I would assume that
this would get disk corruption, no?  Or if you preffer the other
question, what protects us from disk corruption?

Once here, I guess we can get the migrate_already_active() bit without
problem?

Later, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
@ 2015-07-13 11:35   ` Juan Quintela
  2015-07-13 15:33     ` Dr. David Alan Gilbert
  2015-07-21 10:42   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 11:35 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Add qemu_savevm_state_complete_postcopy to complement
> qemu_savevm_state_complete_precopy together with a new
> save_live_complete_postcopy method on devices.
>
> The save_live_complete_precopy method is called on
> all devices during a precopy migration, and all non-postcopy
> devices during a postcopy migration at the transition.
>
> The save_live_complete_postcopy method is called at
> the end of postcopy for all postcopiable devices.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>


> @@ -947,13 +987,15 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
>      int vmdesc_len;
>      SaveStateEntry *se;
>      int ret;
> +    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
>  
>      trace_savevm_state_complete_precopy();
>  
>      cpu_synchronize_all_states();
>  
>      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> -        if (!se->ops || !se->ops->save_live_complete_precopy) {
> +        if (!se->ops || !se->ops->save_live_complete_precopy ||
> +            (in_postcopy && se->ops->save_live_complete_postcopy)) {
>              continue;
>          }

I would change the formatting to something like:

       if (!se->ops ||
           (in_postcopy && se->ops->save_live_complete_postcopy)
           !se->ops->save_live_complete_precopy) {
              continue
           }

Just to make easier to see when we exit?

Later, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
@ 2015-07-13 11:47   ` Juan Quintela
  2015-09-15 17:01     ` Dr. David Alan Gilbert
  2015-07-21 11:36   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 11:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Where postcopy is preceeded by a period of precopy, the destination will
> have received pages that may have been dirtied on the source after the
> page was sent.  The destination must throw these pages away before
> starting it's CPUs.
>
> Maintain a 'sentmap' of pages that have already been sent.
> Calculate list of sent & dirty pages
> Provide helpers on the destination side to discard these.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Not a patch without a suggestion O:-)

> ---
>  include/migration/migration.h    |  12 +++
>  include/migration/postcopy-ram.h |  35 +++++++
>  include/qemu/typedefs.h          |   1 +
>  migration/migration.c            |   1 +
>  migration/postcopy-ram.c         | 108 +++++++++++++++++++++
>  migration/ram.c                  | 203 ++++++++++++++++++++++++++++++++++++++-
>  migration/savevm.c               |   2 -
>  trace-events                     |   5 +
>  8 files changed, 363 insertions(+), 4 deletions(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 2a22381..4c6cf95 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -114,6 +114,13 @@ struct MigrationState
>  
>      /* Flag set once the migration has been asked to enter postcopy */
>      bool start_postcopy;
> +
> +    /* bitmap of pages that have been sent at least once
> +     * only maintained and used in postcopy at the moment
> +     * where it's used to send the dirtymap at the start
> +     * of the postcopy phase
> +     */
> +    unsigned long *sentmap;
>  };

We can use this sentmap for zero page optimization.  If page is on
sentmap, we need to sent a zero page, otherwise, just sent sentmap at
the end of migration and clean everything not there?

> +/*
> + * Discard the contents of memory start..end inclusive.
> + * We can assume that if we've been called postcopy_ram_hosttest returned true
> + */
> +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> +                               uint8_t *end)
> +{
> +    trace_postcopy_ram_discard_range(start, end);
> +    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {

Can we s/end/lenght/ and adjust everywhere?


Not here, but putting a comment explaining where magic 12 cames from on
definition of constant?

I think that the sentbitmap bits could we used without the rest.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
@ 2015-07-13 12:04   ` Juan Quintela
  2015-09-23 19:06     ` Dr. David Alan Gilbert
  2015-07-22  6:19   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 12:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/migration/migration.h    |   3 +
>  include/migration/postcopy-ram.h |  12 ++++
>  migration/postcopy-ram.c         | 116 +++++++++++++++++++++++++++++++++++++++
>  migration/ram.c                  |  11 ++++
>  migration/savevm.c               |   4 ++
>  trace-events                     |   2 +
>  6 files changed, 148 insertions(+)
>

qemu_hugepage_enable(host_addr, length)?

> +#ifdef MADV_NOHUGEPAGE
> +    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
> +        error_report("%s: NOHUGEPAGE: %s", __func__, strerror(errno));
> +        return -1;
> +    }
> +#endif

qemu_hugepage_disable(host_addr, length)?
> +#ifdef MADV_HUGEPAGE
> +    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
> +        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
> +        return -1;
> +    }
> +#endif
> +
> +    /*
> +     * We can also turn off userfault now since we should have all the
> +     * pages.   It can be useful to leave it on to debug postcopy
> +     * if you're not sure it's always getting every page.
> +     */

qemu_userfault_unregister(host_addr, length)?

> +    range_struct.start = (uintptr_t)host_addr;
> +    range_struct.len = length;
> +
> +    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
> +        error_report("%s: userfault unregister %s", __func__, strerror(errno));
> +
> +        return -1;
> +    }

>  
> +/*
> + * Allocate data structures etc needed by incoming migration with postcopy-ram
> + * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
> + */
> +int ram_postcopy_incoming_init(MigrationIncomingState *mis)
> +{
> +    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> +
> +    return postcopy_ram_incoming_init(mis, ram_pages);
> +}
> +

ram_postocpy_incoming_init()
and
postcopy_ram_incoming_init()

ouch  Thinking about better names ....



>  static int ram_load(QEMUFile *f, void *opaque, int version_id)
>  {
>      int flags = 0, ret = 0;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index e6398dd..f4de52d 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1238,6 +1238,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
>          return -1;
>      }
>  
> +    if (ram_postcopy_incoming_init(mis)) {
> +        return -1;
> +    }
> +

how/where we know that this is called soon enough?

>      postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE);
>  
>      return 0;
> diff --git a/trace-events b/trace-events
> index 5e8a120..2ffc1c6 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1498,7 +1498,9 @@ rdma_start_outgoing_migration_after_rdma_source_init(void) ""
>  
>  # migration/postcopy-ram.c
>  postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s mask words sent=%d in %d commands"
> +postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
>  postcopy_ram_discard_range(void *start, void *end) "%p,%p"
> +postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"

once here, if we have range names before, what about:

postcopy_ram_cleanup_range()
postcopy_ram_init_range()

And let the ram* functions the same?

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
@ 2015-07-13 12:10   ` Juan Quintela
  2015-07-13 17:36     ` Dr. David Alan Gilbert
  2015-07-23  5:22   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 12:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Mark the area of RAM as 'userfault'
> Start up a fault-thread to handle any userfaults we might receive
> from it (to be filled in later)
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/migration/migration.h    |  3 ++
>  include/migration/postcopy-ram.h |  6 ++++
>  migration/postcopy-ram.c         | 69 +++++++++++++++++++++++++++++++++++++++-
>  migration/savevm.c               |  9 ++++++
>  4 files changed, 86 insertions(+), 1 deletion(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 98e2568..e6585c5 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -69,6 +69,9 @@ struct MigrationIncomingState {
>       */
>      QemuEvent      main_thread_load_event;
>  
> +    QemuThread     fault_thread;

Best name ever?  Well, it could be "wrong_thread" O:-)
> +static void *postcopy_ram_fault_thread(void *opaque)
> +{
> +    MigrationIncomingState *mis = (MigrationIncomingState *)opaque;

Uneeded cast.

> +
> +    fprintf(stderr, "postcopy_ram_fault_thread\n");
> +    /* TODO: In later patch */
> +    qemu_sem_post(&mis->fault_thread_sem);
> +    while (1) {
> +        /* TODO: In later patch */
> +    }
> +
> +    return NULL;
> +}
> +
> +int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> +{
> +    /* Create the fault handler thread and wait for it to be ready */
> +    qemu_sem_init(&mis->fault_thread_sem, 0);
> +    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
> +                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
> +    qemu_sem_wait(&mis->fault_thread_sem);
> +    qemu_sem_destroy(&mis->fault_thread_sem);
> +
> +    /* Mark so that we get notified of accesses to unwritten areas */
> +    if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  #else
>  /* No target OS support, stubs just fail */
> -

This belongs in a different patch O:-)

If you have to resend, just change them.

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
  2015-06-17 12:28   ` Juan Quintela
@ 2015-07-13 12:37   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-13 12:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:24], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The destination sets the fd to non-blocking on incoming migrations;
> this also affects the return path from the destination, and thus we
> need to make sure we can safely write to the return path.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/42] Migration commands
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 12/42] Migration commands Dr. David Alan Gilbert (git)
  2015-06-17 12:31   ` Juan Quintela
@ 2015-07-13 12:45   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-13 12:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:25], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Create QEMU_VM_COMMAND section type for sending commands from
> source to destination.  These commands are not intended to convey
> guest state but to control the migration process.
> 
> For use in postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/42] Return path: Control commands
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 13/42] Return path: Control commands Dr. David Alan Gilbert (git)
  2015-06-17 12:49   ` Juan Quintela
@ 2015-07-13 12:55   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-13 12:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:26], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add two src->dest commands:
>    * OPEN_RETURN_PATH - To request that the destination open the return path
>    * PING - Request an acknowledge from the destination
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
@ 2015-07-13 12:56   ` Juan Quintela
  2015-07-13 17:56     ` Dr. David Alan Gilbert
  2015-07-23  5:55   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 12:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Rework the migration thread to setup and start postcopy.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |   3 +
>  migration/migration.c         | 166 ++++++++++++++++++++++++++++++++++++++++--
>  trace-events                  |   4 +
>  3 files changed, 167 insertions(+), 6 deletions(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index e6585c5..68a1731 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -120,6 +120,9 @@ struct MigrationState
>      /* Flag set once the migration has been asked to enter postcopy */
>      bool start_postcopy;
>  
> +    /* Flag set once the migration thread is running (and needs joining) */
> +    bool started_migration_thread;
> +

migration_thread_started?

> +
> +    /*
> +     * send rest of state - note things that are doing postcopy
> +     * will notice we're in POSTCOPY_ACTIVE and not actually
> +     * wrap their state up here
> +     */
> +    qemu_file_set_rate_limit(ms->file, INT64_MAX);

Do we undo this?  or, are we sure that it is ok to maximize network
output?

> +    /* Ping just for debugging, helps line traces up */
> +    qemu_savevm_send_ping(ms->file, 2);

Change the values 1, 2, 3 to constants?

> +     * We need to leave the fd free for page transfers during the
> +     * loading of the device state, so wrap all the remaining
> +     * commands and state into a package that gets sent in one go
> +     */
> +    QEMUFile *fb = qemu_bufopen("w", NULL);
> +    if (!fb) {
> +        error_report("Failed to create buffered file");
> +        goto fail;
> +    }
> +
> +    qemu_savevm_state_complete_precopy(fb);
> +    qemu_savevm_send_ping(fb, 3);
> +
> +    qemu_savevm_send_postcopy_run(fb);
> +
> +    /* <><> end of stuff going into the package */
> +    qsb = qemu_buf_get(fb);
> +
> +    /* Now send that blob */
> +    if (qemu_savevm_send_packaged(ms->file, qsb)) {
> +        goto fail_closefb;
> +    }
> +    qemu_fclose(fb);

Why can't we send this directly without the extra copy?
I guess that there are some missing/extra section starts/end whatever?
Anything specific?

> +    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;

Now, that we are here, is there a counter of the time that takes the
postcopy stage?  Just curious.
> +/*
>   * Master migration thread on the source VM.
>   * It drives the migration and pumps the data down the outgoing channel.
>   */
>  static void *migration_thread(void *opaque)
>  {
>      MigrationState *s = opaque;
> +    /* Used by the bandwidth calcs, updated later */
>      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>      int64_t initial_bytes = 0;
>      int64_t max_size = 0;
>      int64_t start_time = initial_time;
>      bool old_vm_running = false;
> +    bool entered_postcopy = false;
> +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> +    enum MigrationStatus current_active_type = MIGRATION_STATUS_ACTIVE;

current_active_state?

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread Dr. David Alan Gilbert (git)
@ 2015-07-13 13:15   ` Juan Quintela
  2015-07-23  6:41     ` Amit Shah
  2015-08-04 11:31     ` Dr. David Alan Gilbert
  2015-07-23  6:41   ` Amit Shah
  1 sibling, 2 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 13:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The end of migration in postcopy is a bit different since some of
> the things normally done at the end of migration have already been
> done on the transition to postcopy.
>
> The end of migration code is getting a bit complciated now, so
> move out into its own function.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

I think that I would splint the function and then add the postcopy code.

BTW, it is a local function, we can use shorter names:

migration_completion()?

trace names specifically get hugggggggggge.


> +static void migration_thread_end_of_iteration(MigrationState *s,
> +                                              int current_active_state,

RunState?
And it is not needed as parameter.


> +                                              bool *old_vm_running,
> +                                              int64_t *start_time)
> +{
> +    int ret;
> +    if (s->state == MIGRATION_STATUS_ACTIVE) {
           current_active_state = s->state;
> +        qemu_mutex_lock_iothread();
> +        *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> +        *old_vm_running = runstate_is_running();
> +
> +        ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> +        if (ret >= 0) {
> +            qemu_file_set_rate_limit(s->file, INT64_MAX);
> +            qemu_savevm_state_complete_precopy(s->file);
> +        }
> +        qemu_mutex_unlock_iothread();
> +
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
           current_active_state = s->state;
> +        trace_migration_thread_end_of_iteration_postcopy_end();
> +
> +        qemu_savevm_state_complete_postcopy(s->file);
> +        trace_migration_thread_end_of_iteration_postcopy_end_after_complete();
> +    }
> +
> +    /*
> +     * If rp was opened we must clean up the thread before
> +     * cleaning everything else up (since if there are no failures
> +     * it will wait for the destination to send it's status in
> +     * a SHUT command).
> +     * Postcopy opens rp if enabled (even if it's not avtivated)
> +     */
> +    if (migrate_postcopy_ram()) {
> +        int rp_error;
> +        trace_migration_thread_end_of_iteration_postcopy_end_before_rp();
> +        rp_error = await_return_path_close_on_source(s);
> +        trace_migration_thread_end_of_iteration_postcopy_end_after_rp(rp_error);
> +        if (rp_error) {
> +            goto fail;
> +        }
> +    }
> +
> +    if (qemu_file_get_error(s->file)) {
> +        trace_migration_thread_end_of_iteration_file_err();
> +        goto fail;
> +    }
> +
> +    migrate_set_state(s, current_active_state, MIGRATION_STATUS_COMPLETED);
> +    return;
> +
> +fail:
> +    migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
> +}
> +
> +/*
>   * Master migration thread on the source VM.
>   * It drives the migration and pumps the data down the outgoing channel.
>   */
> @@ -1233,31 +1294,11 @@ static void *migration_thread(void *opaque)
>                  /* Just another iteration step */
>                  qemu_savevm_state_iterate(s->file);
>              } else {
> -                int ret;
> -
> -                qemu_mutex_lock_iothread();
> -                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> -                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> -                old_vm_running = runstate_is_running();
> -
> -                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> -                if (ret >= 0) {
> -                    qemu_file_set_rate_limit(s->file, INT64_MAX);
> -                    qemu_savevm_state_complete_precopy(s->file);
> -                }
> -                qemu_mutex_unlock_iothread();
> +                trace_migration_thread_low_pending(pending_size);
>  
> -                if (ret < 0) {
> -                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
> -                                      MIGRATION_STATUS_FAILED);
> -                    break;
> -                }
> -
> -                if (!qemu_file_get_error(s->file)) {
> -                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
> -                                      MIGRATION_STATUS_COMPLETED);
> -                    break;
> -                }
> +                migration_thread_end_of_iteration(s, current_active_type,
> +                    &old_vm_running, &start_time);
> +                break;
>              }
>          }
>  
> diff --git a/trace-events b/trace-events
> index f096877..528d5a3 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1425,6 +1425,12 @@ migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
>  migration_thread_after_loop(void) ""
>  migration_thread_file_err(void) ""
>  migration_thread_setup_complete(void) ""
> +migration_thread_low_pending(uint64_t pending) "%" PRIu64
> +migration_thread_end_of_iteration_file_err(void) ""
> +migration_thread_end_of_iteration_postcopy_end(void) ""
> +migration_thread_end_of_iteration_postcopy_end_after_complete(void) ""
> +migration_thread_end_of_iteration_postcopy_end_before_rp(void) ""
> +migration_thread_end_of_iteration_postcopy_end_after_rp(int rp_error) "%d"
>  open_return_path_on_source(void) ""
>  open_return_path_on_source_continue(void) ""
>  postcopy_start(void) ""

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
@ 2015-07-13 13:24   ` Juan Quintela
  2015-08-06 14:15     ` Dr. David Alan Gilbert
  2015-07-23  6:50   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 13:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
> destination to request a page from the source.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  include/migration/migration.h |  4 +++
>  migration/migration.c         | 70 +++++++++++++++++++++++++++++++++++++++++++
>  trace-events                  |  1 +
>  3 files changed, 75 insertions(+)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 68a1731..8742d53 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -47,6 +47,8 @@ enum mig_rp_message_type {
>      MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
>      MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
>      MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
> +
> +    MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be64) */

Not that I really care, buht I think that leng could be 32bits.  I am
not seing networking getting good at multigigabytes transfers soon O:-)


> +void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
> +                              ram_addr_t start, ram_addr_t len);

Shouldn't len be a size_t?
(yes, I know that migration code is not really consistent about that)

>  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> diff --git a/migration/migration.c b/migration/migration.c
> index 3e5a7c8..0373b77 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -113,6 +113,36 @@ static void deferred_incoming_migration(Error **errp)
>      deferred_incoming = true;
>  }
>  
> +/* Request a range of pages from the source VM at the given
> + * start address.
> + *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
> + *           as the last request (a name must have been given previously)
> + *   Start: Address offset within the RB
> + *   Len: Length in bytes required - must be a multiple of pagesize
> + */
> +void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> +                               ram_addr_t start, ram_addr_t len)
> +{
> +    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
> +    uint64_t *buf64 = (uint64_t *)bufc;
> +    size_t msglen = 16; /* start + len */
> +
> +    assert(!(len & 1));

ohhhh, why can't we get a real flags field?

Scratch that.  Seeing the rest of the code, can't we have two commands:

MIG_RP_MSG_REQ_PAGES
MIG_RP_MSG_REQ_PAGES_WITH_ID

I am not really sure that it makes sense getting a command that can be
of two different lengths only for that?


I am not sure, but having a command with two different payloads look
strange.

Later, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy
  2015-07-13 11:35   ` Juan Quintela
@ 2015-07-13 15:33     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-13 15:33 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Add qemu_savevm_state_complete_postcopy to complement
> > qemu_savevm_state_complete_precopy together with a new
> > save_live_complete_postcopy method on devices.
> >
> > The save_live_complete_precopy method is called on
> > all devices during a precopy migration, and all non-postcopy
> > devices during a postcopy migration at the transition.
> >
> > The save_live_complete_postcopy method is called at
> > the end of postcopy for all postcopiable devices.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Thanks.

> 
> > @@ -947,13 +987,15 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
> >      int vmdesc_len;
> >      SaveStateEntry *se;
> >      int ret;
> > +    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
> >  
> >      trace_savevm_state_complete_precopy();
> >  
> >      cpu_synchronize_all_states();
> >  
> >      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> > -        if (!se->ops || !se->ops->save_live_complete_precopy) {
> > +        if (!se->ops || !se->ops->save_live_complete_precopy ||
> > +            (in_postcopy && se->ops->save_live_complete_postcopy)) {
> >              continue;
> >          }
> 
> I would change the formatting to something like:
> 
>        if (!se->ops ||
>            (in_postcopy && se->ops->save_live_complete_postcopy)
>            !se->ops->save_live_complete_precopy) {
>               continue
>            }
> 
> Just to make easier to see when we exit?

Done

Dave (Starting with the easy fix first :-)

> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-07-13 11:27   ` Juan Quintela
@ 2015-07-13 15:53     ` Dr. David Alan Gilbert
  2015-07-13 16:26       ` Juan Quintela
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-13 15:53 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
> >
> > 'migration_postcopy_phase' is provided for other sections to know if
> > they're in postcopy.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> 
> But (there is always a but....)
> 
> 
> > @@ -358,6 +359,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
> >  
> >          get_xbzrle_cache_stats(info);
> >          break;
> > +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> > +        /* Mostly the same as active; TODO add some postcopy stats */
> > +        info->has_status = true;
> > +        info->has_total_time = true;
> > +        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
> > +            - s->total_time;
> > +        info->has_expected_downtime = true;
> > +        info->expected_downtime = s->expected_downtime;
> > +        info->has_setup_time = true;
> > +        info->setup_time = s->setup_time;
> > +
> > +        info->has_ram = true;
> > +        info->ram = g_malloc0(sizeof(*info->ram));
> > +        info->ram->transferred = ram_bytes_transferred();
> > +        info->ram->remaining = ram_bytes_remaining();
> > +        info->ram->total = ram_bytes_total();
> > +        info->ram->duplicate = dup_mig_pages_transferred();
> > +        info->ram->skipped = skipped_mig_pages_transferred();
> > +        info->ram->normal = norm_mig_pages_transferred();
> > +        info->ram->normal_bytes = norm_mig_bytes_transferred();
> > +        info->ram->dirty_pages_rate = s->dirty_pages_rate;
> > +        info->ram->mbps = s->mbps;
> > +
> > +        if (blk_mig_active()) {
> > +            info->has_disk = true;
> > +            info->disk = g_malloc0(sizeof(*info->disk));
> > +            info->disk->transferred = blk_mig_bytes_transferred();
> > +            info->disk->remaining = blk_mig_bytes_remaining();
> > +            info->disk->total = blk_mig_bytes_total();
> > +        }
> 
> Can we have block migration active with postcopy?  I would assume that
> this would get disk corruption, no?  Or if you preffer the other
> question, what protects us from disk corruption?

I think you can, I've not tried it; however I also think it should
be safe.

 migration/block.c's block_save_pending always puts a value in the
non_postcopiable_pending return value (and 0 in the postcopiable_pending);
the migrate thread checks the non_postcopiable_pending size to
decide when it can switch to postcopy, and performs a call to the complete
method on each device before it does.  Thus the block migration should
be finished before we start doing the actual postcopy stage, and thus
before the destination CPU starts running.

A possibly harder question is what happens if block.c did implement
postcopy and you had both block postcopy and ram postcopy active at
the same time; again I think it should work but I'm not sure if one
would starve the other.

> Once here, I guess we can get the migrate_already_active() bit without
> problem?

I'm not sure of the question here; but the idea of migration_already_active()
is just to avoid all of the open-coded checks for each possible state;
now we've added anothe state they were getting messy.

Dave

> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-07-13 15:53     ` Dr. David Alan Gilbert
@ 2015-07-13 16:26       ` Juan Quintela
  2015-07-13 16:48         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 16:26 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >
>> > 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
>> >
>> > 'migration_postcopy_phase' is provided for other sections to know if
>> > they're in postcopy.
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>> > Reviewed-by: Eric Blake <eblake@redhat.com>
>> 
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> 
>> 
>> But (there is always a but....)
>> 
>> 
>> > @@ -358,6 +359,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>> >  
>> >          get_xbzrle_cache_stats(info);
>> >          break;
>> > +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>> > +        /* Mostly the same as active; TODO add some postcopy stats */
>> > +        info->has_status = true;
>> > +        info->has_total_time = true;
>> > +        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
>> > +            - s->total_time;
>> > +        info->has_expected_downtime = true;
>> > +        info->expected_downtime = s->expected_downtime;
>> > +        info->has_setup_time = true;
>> > +        info->setup_time = s->setup_time;
>> > +
>> > +        info->has_ram = true;
>> > +        info->ram = g_malloc0(sizeof(*info->ram));
>> > +        info->ram->transferred = ram_bytes_transferred();
>> > +        info->ram->remaining = ram_bytes_remaining();
>> > +        info->ram->total = ram_bytes_total();
>> > +        info->ram->duplicate = dup_mig_pages_transferred();
>> > +        info->ram->skipped = skipped_mig_pages_transferred();
>> > +        info->ram->normal = norm_mig_pages_transferred();
>> > +        info->ram->normal_bytes = norm_mig_bytes_transferred();
>> > +        info->ram->dirty_pages_rate = s->dirty_pages_rate;
>> > +        info->ram->mbps = s->mbps;
>> > +
>> > +        if (blk_mig_active()) {
>> > +            info->has_disk = true;
>> > +            info->disk = g_malloc0(sizeof(*info->disk));
>> > +            info->disk->transferred = blk_mig_bytes_transferred();
>> > +            info->disk->remaining = blk_mig_bytes_remaining();
>> > +            info->disk->total = blk_mig_bytes_total();
>> > +        }
>> 
>> Can we have block migration active with postcopy?  I would assume that
>> this would get disk corruption, no?  Or if you preffer the other
>> question, what protects us from disk corruption?
>
> I think you can, I've not tried it; however I also think it should
> be safe.
>
>  migration/block.c's block_save_pending always puts a value in the
> non_postcopiable_pending return value (and 0 in the postcopiable_pending);
> the migrate thread checks the non_postcopiable_pending size to
> decide when it can switch to postcopy, and performs a call to the complete
> method on each device before it does.  Thus the block migration should
> be finished before we start doing the actual postcopy stage, and thus
> before the destination CPU starts running.

I mean that as it is right now, the info under blk_mig_active() check
would be zero/the same than before entering postcopy.

>
> A possibly harder question is what happens if block.c did implement
> postcopy and you had both block postcopy and ram postcopy active at
> the same time; again I think it should work but I'm not sure if one
> would starve the other.
>
>> Once here, I guess we can get the migrate_already_active() bit without
>> problem?
>
> I'm not sure of the question here; but the idea of migration_already_active()
> is just to avoid all of the open-coded checks for each possible state;
> now we've added anothe state they were getting messy.

Sorry.  I mean that the migrate_already_active() bits can get in without
further ado.  Don't need to wait for postcopy to be integrated.

>
> Dave
>
>> 
>> Later, Juan.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test
  2015-07-13 11:20   ` Juan Quintela
@ 2015-07-13 16:31     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-13 16:31 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Provide a check to see if the OS we're running on has all the bits
> > needed for postcopy.
> >
> > Creates postcopy-ram.c which will get most of the other helpers we need.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> I am guessing that test is ok, but we are doing the test each time that
> we change the function.  We always end calling that kind of functions in
> several places.  Shouldn't be good to rename the function to
> __postcopy_ram_supported_by_host()
> 
> and do a toplevel function that is:
> 
> bool postcopy_ram_supported_by_host(void)
> {
>         static bool first_time = true;
>         static supported = false;
> 
>         if (firt_time) {
>            first_time = false;
>            supported = __postcopy_ram_supported_by_host()
>         }
>         return supported;
> }
> 
> Notice that I don't know how slow the mmap + usefault thing is, but I
> guess that the values would not change while running, no?

Since we only call this once, at the start of an incoming migration,
it seems overkill to do that.

Dave

> 
> It has a review-by because I don't see anything wrong with it.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-07-13 16:26       ` Juan Quintela
@ 2015-07-13 16:48         ` Dr. David Alan Gilbert
  2015-07-13 18:05           ` Juan Quintela
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-13 16:48 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> >> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> >> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> >
> >> > 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
> >> >
> >> > 'migration_postcopy_phase' is provided for other sections to know if
> >> > they're in postcopy.
> >> >
> >> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> >> > Reviewed-by: Eric Blake <eblake@redhat.com>
> >> 
> >> Reviewed-by: Juan Quintela <quintela@redhat.com>
> >> 
> >> 
> >> But (there is always a but....)
> >> 
> >> 
> >> > @@ -358,6 +359,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
> >> >  
> >> >          get_xbzrle_cache_stats(info);
> >> >          break;
> >> > +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> >> > +        /* Mostly the same as active; TODO add some postcopy stats */
> >> > +        info->has_status = true;
> >> > +        info->has_total_time = true;
> >> > +        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
> >> > +            - s->total_time;
> >> > +        info->has_expected_downtime = true;
> >> > +        info->expected_downtime = s->expected_downtime;
> >> > +        info->has_setup_time = true;
> >> > +        info->setup_time = s->setup_time;
> >> > +
> >> > +        info->has_ram = true;
> >> > +        info->ram = g_malloc0(sizeof(*info->ram));
> >> > +        info->ram->transferred = ram_bytes_transferred();
> >> > +        info->ram->remaining = ram_bytes_remaining();
> >> > +        info->ram->total = ram_bytes_total();
> >> > +        info->ram->duplicate = dup_mig_pages_transferred();
> >> > +        info->ram->skipped = skipped_mig_pages_transferred();
> >> > +        info->ram->normal = norm_mig_pages_transferred();
> >> > +        info->ram->normal_bytes = norm_mig_bytes_transferred();
> >> > +        info->ram->dirty_pages_rate = s->dirty_pages_rate;
> >> > +        info->ram->mbps = s->mbps;
> >> > +
> >> > +        if (blk_mig_active()) {
> >> > +            info->has_disk = true;
> >> > +            info->disk = g_malloc0(sizeof(*info->disk));
> >> > +            info->disk->transferred = blk_mig_bytes_transferred();
> >> > +            info->disk->remaining = blk_mig_bytes_remaining();
> >> > +            info->disk->total = blk_mig_bytes_total();
> >> > +        }
> >> 
> >> Can we have block migration active with postcopy?  I would assume that
> >> this would get disk corruption, no?  Or if you preffer the other
> >> question, what protects us from disk corruption?
> >
> > I think you can, I've not tried it; however I also think it should
> > be safe.
> >
> >  migration/block.c's block_save_pending always puts a value in the
> > non_postcopiable_pending return value (and 0 in the postcopiable_pending);
> > the migrate thread checks the non_postcopiable_pending size to
> > decide when it can switch to postcopy, and performs a call to the complete
> > method on each device before it does.  Thus the block migration should
> > be finished before we start doing the actual postcopy stage, and thus
> > before the destination CPU starts running.
> 
> I mean that as it is right now, the info under blk_mig_active() check
> would be zero/the same than before entering postcopy.

Ah, yes;  would blk_mig_bytes_total/transferred still have valid values you
would want to display, even at the end of the block migration phase?

> >
> > A possibly harder question is what happens if block.c did implement
> > postcopy and you had both block postcopy and ram postcopy active at
> > the same time; again I think it should work but I'm not sure if one
> > would starve the other.
> >
> >> Once here, I guess we can get the migrate_already_active() bit without
> >> problem?
> >
> > I'm not sure of the question here; but the idea of migration_already_active()
> > is just to avoid all of the open-coded checks for each possible state;
> > now we've added anothe state they were getting messy.
> 
> Sorry.  I mean that the migrate_already_active() bits can get in without
> further ado.  Don't need to wait for postcopy to be integrated.

Yes; do you want it split out?

Dave

> 
> >
> > Dave
> >
> >> 
> >> Later, Juan.
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-07-13 11:23   ` Juan Quintela
@ 2015-07-13 17:13     ` Dr. David Alan Gilbert
  2015-07-13 18:07       ` Juan Quintela
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-13 17:13 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Once postcopy is enabled (with migrate_set_capability), the migration
> > will still start on precopy mode.  To cause a transition into postcopy
> > the:
> >
> >   migrate_start_postcopy
> >
> > command must be issued.  Postcopy will start sometime after this
> > (when it's next checked in the migration loop).
> >
> > Issuing the command before migration has started will error,
> > and issuing after it has finished is ignored.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index a5951ac..e973490 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -111,6 +111,9 @@ struct MigrationState
> >      int64_t xbzrle_cache_size;
> >      int64_t setup_time;
> >      int64_t dirty_sync_count;
> > +
> > +    /* Flag set once the migration has been asked to enter postcopy */
> > +    bool start_postcopy;
> >  };
> >  
> >  void process_incoming_migration(QEMUFile *f);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index e77b8b4..6fc47f9 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -465,6 +465,28 @@ void qmp_migrate_set_parameters(bool has_compress_level,
> >      }
> >  }
> >  
> > +void qmp_migrate_start_postcopy(Error **errp)
> > +{
> > +    MigrationState *s = migrate_get_current();
> > +
> > +    if (!migrate_postcopy_ram()) {
> > +        error_setg(errp, "Enable postcopy with migration_set_capability before"
> > +                         " the start of migration");
> > +        return;
> > +    }
> > +
> > +    if (s->state == MIGRATION_STATUS_NONE) {
> 
> I would claim that this check should be:
> 
>     if (s->state != MIGRATION_STATUS_ACTIVE) {
> ??
> 
> FAILED, COMPLETED, CANCELL* don't make sense, right?

What I'm trying to catch here is people doing:
     migrate_start_postcopy
     migrate tcp:pppp:whereever

  which wont work, because migrate_init reinitialises
the flag that start previously set.

However, I also don't want to create a race, since what you do is
typically:
     migrate  tcp:pppp:whereever
   <wait some time, get bored>
     migrate_start_postcopy

if you're unlucky, and the migration finishes just
at the same time you do the migrate_start_postcopy, do you
want migrate_start_postcopy to fail?  My guess was it
was best for it not to fail, in this case.

Dave

> 
> Thanks, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault
  2015-07-13 12:10   ` Juan Quintela
@ 2015-07-13 17:36     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-13 17:36 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Mark the area of RAM as 'userfault'
> > Start up a fault-thread to handle any userfaults we might receive
> > from it (to be filled in later)
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/migration/migration.h    |  3 ++
> >  include/migration/postcopy-ram.h |  6 ++++
> >  migration/postcopy-ram.c         | 69 +++++++++++++++++++++++++++++++++++++++-
> >  migration/savevm.c               |  9 ++++++
> >  4 files changed, 86 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 98e2568..e6585c5 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -69,6 +69,9 @@ struct MigrationIncomingState {
> >       */
> >      QemuEvent      main_thread_load_event;
> >  
> > +    QemuThread     fault_thread;
> 
> Best name ever?  Well, it could be "wrong_thread" O:-)

Well they are 'user faults'!   Not that we blame the user for
them personally of course.

> > +static void *postcopy_ram_fault_thread(void *opaque)
> > +{
> > +    MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
> 
> Uneeded cast.

Thanks, gone.

> > +
> > +    fprintf(stderr, "postcopy_ram_fault_thread\n");
> > +    /* TODO: In later patch */
> > +    qemu_sem_post(&mis->fault_thread_sem);
> > +    while (1) {
> > +        /* TODO: In later patch */
> > +    }
> > +
> > +    return NULL;
> > +}
> > +
> > +int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> > +{
> > +    /* Create the fault handler thread and wait for it to be ready */
> > +    qemu_sem_init(&mis->fault_thread_sem, 0);
> > +    qemu_thread_create(&mis->fault_thread, "postcopy/fault",
> > +                       postcopy_ram_fault_thread, mis, QEMU_THREAD_JOINABLE);
> > +    qemu_sem_wait(&mis->fault_thread_sem);
> > +    qemu_sem_destroy(&mis->fault_thread_sem);
> > +
> > +    /* Mark so that we get notified of accesses to unwritten areas */
> > +    if (qemu_ram_foreach_block(ram_block_enable_notify, mis)) {
> > +        return -1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  #else
> >  /* No target OS support, stubs just fail */
> > -
> 
> This belongs in a different patch O:-)

Thanks, gone.

> If you have to resend, just change them.
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread
  2015-07-13 12:56   ` Juan Quintela
@ 2015-07-13 17:56     ` Dr. David Alan Gilbert
  2015-07-13 18:09       ` Juan Quintela
  2015-07-23  5:53       ` Amit Shah
  0 siblings, 2 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-13 17:56 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Rework the migration thread to setup and start postcopy.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |   3 +
> >  migration/migration.c         | 166 ++++++++++++++++++++++++++++++++++++++++--
> >  trace-events                  |   4 +
> >  3 files changed, 167 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index e6585c5..68a1731 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -120,6 +120,9 @@ struct MigrationState
> >      /* Flag set once the migration has been asked to enter postcopy */
> >      bool start_postcopy;
> >  
> > +    /* Flag set once the migration thread is running (and needs joining) */
> > +    bool started_migration_thread;
> > +
> 
> migration_thread_started?

Changed.

> > +
> > +    /*
> > +     * send rest of state - note things that are doing postcopy
> > +     * will notice we're in POSTCOPY_ACTIVE and not actually
> > +     * wrap their state up here
> > +     */
> > +    qemu_file_set_rate_limit(ms->file, INT64_MAX);
> 
> Do we undo this?  or, are we sure that it is ok to maximize network
> output?

No we don't undo it;  it's a good question what we can do better.
I'm trying to avoid delaying the postcopy-requested pages; ideally
I'd like to separate those out so they get satisfied but still
meet the bandwidth limit for the background transfer.
The ideal is separate fd's, however something else I've considered
is getting incoming postcopy requests to wake the outgoing side
up when it's sleeping for the bandwidth limit, although I've
not tried implementing that yet.

> > +    /* Ping just for debugging, helps line traces up */
> > +    qemu_savevm_send_ping(ms->file, 2);
> 
> Change the values 1, 2, 3 to constants?

Suggestions to names? - they purely for debugging so you can
match it up on the destination.

> > +     * We need to leave the fd free for page transfers during the
> > +     * loading of the device state, so wrap all the remaining
> > +     * commands and state into a package that gets sent in one go
> > +     */
> > +    QEMUFile *fb = qemu_bufopen("w", NULL);
> > +    if (!fb) {
> > +        error_report("Failed to create buffered file");
> > +        goto fail;
> > +    }
> > +
> > +    qemu_savevm_state_complete_precopy(fb);
> > +    qemu_savevm_send_ping(fb, 3);
> > +
> > +    qemu_savevm_send_postcopy_run(fb);
> > +
> > +    /* <><> end of stuff going into the package */
> > +    qsb = qemu_buf_get(fb);
> > +
> > +    /* Now send that blob */
> > +    if (qemu_savevm_send_packaged(ms->file, qsb)) {
> > +        goto fail_closefb;
> > +    }
> > +    qemu_fclose(fb);
> 
> Why can't we send this directly without the extra copy?
> I guess that there are some missing/extra section starts/end whatever?
> Anything specific?

The problem is that the destination has to be able to read the chunk
of migration stream off the fd and leave the fd free for page requests
that may be required during loading the device state.
Since the migration-stream is unstructured, there is no way to read
a chunk of stream off without knowing the length of that chunk, and the
only way to know that chunk is to write it to a buffer and then see
how big it is.

> > +    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
> 
> Now, that we are here, is there a counter of the time that takes the
> postcopy stage?  Just curious.

No, not separate.

> > +/*
> >   * Master migration thread on the source VM.
> >   * It drives the migration and pumps the data down the outgoing channel.
> >   */
> >  static void *migration_thread(void *opaque)
> >  {
> >      MigrationState *s = opaque;
> > +    /* Used by the bandwidth calcs, updated later */
> >      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >      int64_t initial_bytes = 0;
> >      int64_t max_size = 0;
> >      int64_t start_time = initial_time;
> >      bool old_vm_running = false;
> > +    bool entered_postcopy = false;
> > +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> > +    enum MigrationStatus current_active_type = MIGRATION_STATUS_ACTIVE;
> 
> current_active_state?

Changed.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-07-13 16:48         ` Dr. David Alan Gilbert
@ 2015-07-13 18:05           ` Juan Quintela
  0 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 18:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>> > * Juan Quintela (quintela@redhat.com) wrote:
>> >> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> >> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >> >
>> >> > 'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy
>> >> >
>> >> > 'migration_postcopy_phase' is provided for other sections to know if
>> >> > they're in postcopy.
>> >> >
>> >> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> >> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>> >> > Reviewed-by: Eric Blake <eblake@redhat.com>
>> >> 
>> >> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> >> 
>> >> 
>> >> But (there is always a but....)
>> >> 
>> >> 
>> >> > @@ -358,6 +359,39 @@ MigrationInfo *qmp_query_migrate(Error **errp)
>> >> >  
>> >> >          get_xbzrle_cache_stats(info);
>> >> >          break;
>> >> > +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
>> >> > +        /* Mostly the same as active; TODO add some postcopy stats */
>> >> > +        info->has_status = true;
>> >> > +        info->has_total_time = true;
>> >> > +        info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
>> >> > +            - s->total_time;
>> >> > +        info->has_expected_downtime = true;
>> >> > +        info->expected_downtime = s->expected_downtime;
>> >> > +        info->has_setup_time = true;
>> >> > +        info->setup_time = s->setup_time;
>> >> > +
>> >> > +        info->has_ram = true;
>> >> > +        info->ram = g_malloc0(sizeof(*info->ram));
>> >> > +        info->ram->transferred = ram_bytes_transferred();
>> >> > +        info->ram->remaining = ram_bytes_remaining();
>> >> > +        info->ram->total = ram_bytes_total();
>> >> > +        info->ram->duplicate = dup_mig_pages_transferred();
>> >> > +        info->ram->skipped = skipped_mig_pages_transferred();
>> >> > +        info->ram->normal = norm_mig_pages_transferred();
>> >> > +        info->ram->normal_bytes = norm_mig_bytes_transferred();
>> >> > +        info->ram->dirty_pages_rate = s->dirty_pages_rate;
>> >> > +        info->ram->mbps = s->mbps;
>> >> > +
>> >> > +        if (blk_mig_active()) {
>> >> > +            info->has_disk = true;
>> >> > +            info->disk = g_malloc0(sizeof(*info->disk));
>> >> > +            info->disk->transferred = blk_mig_bytes_transferred();
>> >> > +            info->disk->remaining = blk_mig_bytes_remaining();
>> >> > +            info->disk->total = blk_mig_bytes_total();
>> >> > +        }
>> >> 
>> >> Can we have block migration active with postcopy?  I would assume that
>> >> this would get disk corruption, no?  Or if you preffer the other
>> >> question, what protects us from disk corruption?
>> >
>> > I think you can, I've not tried it; however I also think it should
>> > be safe.
>> >
>> >  migration/block.c's block_save_pending always puts a value in the
>> > non_postcopiable_pending return value (and 0 in the postcopiable_pending);
>> > the migrate thread checks the non_postcopiable_pending size to
>> > decide when it can switch to postcopy, and performs a call to the complete
>> > method on each device before it does.  Thus the block migration should
>> > be finished before we start doing the actual postcopy stage, and thus
>> > before the destination CPU starts running.
>> 
>> I mean that as it is right now, the info under blk_mig_active() check
>> would be zero/the same than before entering postcopy.
>
> Ah, yes;  would blk_mig_bytes_total/transferred still have valid values you
> would want to display, even at the end of the block migration phase?
>
>> >
>> > A possibly harder question is what happens if block.c did implement
>> > postcopy and you had both block postcopy and ram postcopy active at
>> > the same time; again I think it should work but I'm not sure if one
>> > would starve the other.
>> >
>> >> Once here, I guess we can get the migrate_already_active() bit without
>> >> problem?
>> >
>> > I'm not sure of the question here; but the idea of migration_already_active()
>> > is just to avoid all of the open-coded checks for each possible state;
>> > now we've added anothe state they were getting messy.
>> 
>> Sorry.  I mean that the migrate_already_active() bits can get in without
>> further ado.  Don't need to wait for postcopy to be integrated.
>
> Yes; do you want it split out?

Whatever is easier for you.  I mean that it can be integrated
independently of postcopy.

so, it is up to you.

Juan.

>
> Dave
>
>> 
>> >
>> > Dave
>> >
>> >> 
>> >> Later, Juan.
>> > --
>> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-07-13 17:13     ` Dr. David Alan Gilbert
@ 2015-07-13 18:07       ` Juan Quintela
  2015-07-21  7:40         ` Amit Shah
  2015-09-24 14:20         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 18:07 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >
>> > Once postcopy is enabled (with migrate_set_capability), the migration
>> > will still start on precopy mode.  To cause a transition into postcopy
>> > the:
>> >
>> >   migrate_start_postcopy
>> >
>> > command must be issued.  Postcopy will start sometime after this
>> > (when it's next checked in the migration loop).
>> >
>> > Issuing the command before migration has started will error,
>> > and issuing after it has finished is ignored.
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> > Reviewed-by: Eric Blake <eblake@redhat.com>
>> 
>> > diff --git a/include/migration/migration.h b/include/migration/migration.h
>> > index a5951ac..e973490 100644
>> > --- a/include/migration/migration.h
>> > +++ b/include/migration/migration.h
>> > @@ -111,6 +111,9 @@ struct MigrationState
>> >      int64_t xbzrle_cache_size;
>> >      int64_t setup_time;
>> >      int64_t dirty_sync_count;
>> > +
>> > +    /* Flag set once the migration has been asked to enter postcopy */
>> > +    bool start_postcopy;
>> >  };
>> >  
>> >  void process_incoming_migration(QEMUFile *f);
>> > diff --git a/migration/migration.c b/migration/migration.c
>> > index e77b8b4..6fc47f9 100644
>> > --- a/migration/migration.c
>> > +++ b/migration/migration.c
>> > @@ -465,6 +465,28 @@ void qmp_migrate_set_parameters(bool has_compress_level,
>> >      }
>> >  }
>> >  
>> > +void qmp_migrate_start_postcopy(Error **errp)
>> > +{
>> > +    MigrationState *s = migrate_get_current();
>> > +
>> > +    if (!migrate_postcopy_ram()) {
>> > +        error_setg(errp, "Enable postcopy with migration_set_capability before"
>> > +                         " the start of migration");
>> > +        return;
>> > +    }
>> > +
>> > +    if (s->state == MIGRATION_STATUS_NONE) {
>> 
>> I would claim that this check should be:
>> 
>>     if (s->state != MIGRATION_STATUS_ACTIVE) {
>> ??
>> 
>> FAILED, COMPLETED, CANCELL* don't make sense, right?
>
> What I'm trying to catch here is people doing:
>      migrate_start_postcopy
>      migrate tcp:pppp:whereever
>
>   which wont work, because migrate_init reinitialises
> the flag that start previously set.
>
> However, I also don't want to create a race, since what you do is
> typically:
>      migrate  tcp:pppp:whereever
>    <wait some time, get bored>
>      migrate_start_postcopy
>
> if you're unlucky, and the migration finishes just
> at the same time you do the migrate_start_postcopy, do you
> want migrate_start_postcopy to fail?  My guess was it
> was best for it not to fail, in this case.

Change the order, if it is ACTIVE: do the postcopy thing, otherwise, do
the clause that is protected now?  Moving to postcopy only make sense if
we are in active.

Later, Juan.


>
> Dave
>
>> 
>> Thanks, Juan.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread
  2015-07-13 17:56     ` Dr. David Alan Gilbert
@ 2015-07-13 18:09       ` Juan Quintela
  2015-09-23 17:56         ` Dr. David Alan Gilbert
  2015-07-23  5:53       ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-13 18:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

>> > +
>> > +    /*
>> > +     * send rest of state - note things that are doing postcopy
>> > +     * will notice we're in POSTCOPY_ACTIVE and not actually
>> > +     * wrap their state up here
>> > +     */
>> > +    qemu_file_set_rate_limit(ms->file, INT64_MAX);
>> 
>> Do we undo this?  or, are we sure that it is ok to maximize network
>> output?
>
> No we don't undo it;  it's a good question what we can do better.
> I'm trying to avoid delaying the postcopy-requested pages; ideally
> I'd like to separate those out so they get satisfied but still
> meet the bandwidth limit for the background transfer.
> The ideal is separate fd's, however something else I've considered
> is getting incoming postcopy requests to wake the outgoing side
> up when it's sleeping for the bandwidth limit, although I've
> not tried implementing that yet.

I see.

>
>> > +    /* Ping just for debugging, helps line traces up */
>> > +    qemu_savevm_send_ping(ms->file, 2);
>> 
>> Change the values 1, 2, 3 to constants?
>
> Suggestions to names? - they purely for debugging so you can
> match it up on the destination.
>
>> > +     * We need to leave the fd free for page transfers during the
>> > +     * loading of the device state, so wrap all the remaining
>> > +     * commands and state into a package that gets sent in one go
>> > +     */
>> > +    QEMUFile *fb = qemu_bufopen("w", NULL);
>> > +    if (!fb) {
>> > +        error_report("Failed to create buffered file");
>> > +        goto fail;
>> > +    }
>> > +
>> > +    qemu_savevm_state_complete_precopy(fb);
>> > +    qemu_savevm_send_ping(fb, 3);
>> > +
>> > +    qemu_savevm_send_postcopy_run(fb);
>> > +
>> > +    /* <><> end of stuff going into the package */
>> > +    qsb = qemu_buf_get(fb);
>> > +
>> > +    /* Now send that blob */
>> > +    if (qemu_savevm_send_packaged(ms->file, qsb)) {
>> > +        goto fail_closefb;
>> > +    }
>> > +    qemu_fclose(fb);
>> 
>> Why can't we send this directly without the extra copy?
>> I guess that there are some missing/extra section starts/end whatever?
>> Anything specific?
>
> The problem is that the destination has to be able to read the chunk
> of migration stream off the fd and leave the fd free for page requests
> that may be required during loading the device state.
> Since the migration-stream is unstructured, there is no way to read
> a chunk of stream off without knowing the length of that chunk, and the
> only way to know that chunk is to write it to a buffer and then see
> how big it is.

Arghhh.  ok.  Comment?

>
>> > +    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
>> 
>> Now, that we are here, is there a counter of the time that takes the
>> postcopy stage?  Just curious.
>
> No, not separate.
>
>> > +/*
>> >   * Master migration thread on the source VM.
>> >   * It drives the migration and pumps the data down the outgoing channel.
>> >   */
>> >  static void *migration_thread(void *opaque)
>> >  {
>> >      MigrationState *s = opaque;
>> > +    /* Used by the bandwidth calcs, updated later */
>> >      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> >      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
>> >      int64_t initial_bytes = 0;
>> >      int64_t max_size = 0;
>> >      int64_t start_time = initial_time;
>> >      bool old_vm_running = false;
>> > +    bool entered_postcopy = false;
>> > +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
>> > +    enum MigrationStatus current_active_type = MIGRATION_STATUS_ACTIVE;
>> 
>> current_active_state?
>
> Changed.
>
> Dave
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request Dr. David Alan Gilbert (git)
@ 2015-07-14  9:18   ` Juan Quintela
  2015-08-06 10:45     ` Dr. David Alan Gilbert
  2015-07-23 12:23   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-14  9:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> On receiving MIG_RPCOMM_REQ_PAGES look up the address and
> queue the page.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


>      migrate_fd_cleanup_src_rp(s);
>  
> +    /* This queue generally should be empty - but in the case of a failed
> +     * migration might have some droppings in.
> +     */
> +    struct MigrationSrcPageRequest *mspr, *next_mspr;
> +    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
> +        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);

How nice of QSIMPLEQ.  To remove elements you don't use mspr....

> +        g_free(mspr);
> +    }
> +
>      if (s->file) {
>          trace_migrate_fd_cleanup();
>          qemu_mutex_unlock_iothread();
> @@ -713,6 +729,8 @@ MigrationState *migrate_init(const MigrationParams *params)
>      s->state = MIGRATION_STATUS_SETUP;
>      trace_migrate_set_state(MIGRATION_STATUS_SETUP);
>  
> +    QSIMPLEQ_INIT(&s->src_page_requests);
> +
>      s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      return s;
>  }
> @@ -976,7 +994,25 @@ static void source_return_path_bad(MigrationState *s)
>  static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
>                                         ram_addr_t start, ram_addr_t len)
>  {
> +    long our_host_ps = getpagesize();
> +
>      trace_migrate_handle_rp_req_pages(rbname, start, len);
> +
> +    /*
> +     * Since we currently insist on matching page sizes, just sanity check
> +     * we're being asked for whole host pages.
> +     */
> +    if (start & (our_host_ps-1) ||
> +       (len & (our_host_ps-1))) {


I don't know if creating a macro is a good idea?
#define HOST_ALIGN_CHECK(addr)  (addr & (getpagesize()-1))

???

Don't me wave a macro for this in qemu?

> index f7d957e..da3e9ea 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -924,6 +924,69 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
>  }
>  
>  /**
> + * Queue the pages for transmission, e.g. a request from postcopy destination
> + *   ms: MigrationStatus in which the queue is held
> + *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
> + *   start: Offset from the start of the RAMBlock
> + *   len: Length (in bytes) to send
> + *   Return: 0 on success
> + */
> +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> +                         ram_addr_t start, ram_addr_t len)
> +{
> +    RAMBlock *ramblock;
> +
> +    rcu_read_lock();
> +    if (!rbname) {
> +        /* Reuse last RAMBlock */
> +        ramblock = ms->last_req_rb;
> +
> +        if (!ramblock) {
> +            /*
> +             * Shouldn't happen, we can't reuse the last RAMBlock if
> +             * it's the 1st request.
> +             */
> +            error_report("ram_save_queue_pages no previous block");
> +            goto err;
> +        }
> +    } else {
> +        ramblock = ram_find_block(rbname);
> +
> +        if (!ramblock) {
> +            /* We shouldn't be asked for a non-existent RAMBlock */
> +            error_report("ram_save_queue_pages no block '%s'", rbname);
> +            goto err;
> +        }

       Here?

> +    }
> +    trace_ram_save_queue_pages(ramblock->idstr, start, len);
> +    if (start+len > ramblock->used_length) {
> +        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
> +                     __func__, start, len, ramblock->used_length);
> +        goto err;
> +    }
> +
> +    struct MigrationSrcPageRequest *new_entry =
> +        g_malloc0(sizeof(struct MigrationSrcPageRequest));
> +    new_entry->rb = ramblock;
> +    new_entry->offset = start;
> +    new_entry->len = len;
> +    ms->last_req_rb = ramblock;

Can we move this line to the else?

> +
> +    qemu_mutex_lock(&ms->src_page_req_mutex);
> +    memory_region_ref(ramblock->mr);

I haven't looked further in the patch series yet, but I can't see on
this patch a memory_region_unref ....  Don't we need it?

> +    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> +    qemu_mutex_unlock(&ms->src_page_req_mutex);
> +    rcu_read_unlock();

Of everything that we have inside the rcu_read_lock() .... Is there
anything else that the memory_region_ref() that needs rcu?

Would not be possible to do the memory reference before asking for the
mutex?

Once here, do we care about calling malloc with the rcu set?  or could
we just call malloc at the beggining of the function and free it in case
that it is not needed on err?

Thanks, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
@ 2015-07-14  9:40   ` Juan Quintela
  2015-09-16 18:36     ` Dr. David Alan Gilbert
  2015-07-27  6:05   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-14  9:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> When transmitting RAM pages, consume pages that have been queued by
> MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
>
> Note:
>   a) After a queued page the linear walk carries on from after the
> unqueued page; there is a reasonable chance that the destination
> was about to ask for other closeby pages anyway.
>
>   b) We have to be careful of any assumptions that the page walking
> code makes, in particular it does some short cuts on its first linear
> walk that break as soon as we do a queued page.
>
>   c) We have to be careful to not break up host-page size chunks, since
> this makes it harder to place the pages on the destination.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> +static bool last_was_from_queue;

Are we using this variable later in the series?

>  static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
>  {
>      migration_dirty_pages +=
> @@ -923,6 +933,41 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
>      return pages;
>  }
>  
> +/*
> + * Unqueue a page from the queue fed by postcopy page requests
> + *
> + * Returns:      The RAMBlock* to transmit from (or NULL if the queue is empty)
> + *      ms:      MigrationState in
> + *  offset:      the byte offset within the RAMBlock for the start of the page
> + * ram_addr_abs: global offset in the dirty/sent bitmaps
> + */
> +static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
> +                                       ram_addr_t *ram_addr_abs)
> +{
> +    RAMBlock *result = NULL;
> +    qemu_mutex_lock(&ms->src_page_req_mutex);
> +    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
> +        struct MigrationSrcPageRequest *entry =
> +                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
> +        result = entry->rb;
> +        *offset = entry->offset;
> +        *ram_addr_abs = (entry->offset + entry->rb->offset) & TARGET_PAGE_MASK;
> +
> +        if (entry->len > TARGET_PAGE_SIZE) {
> +            entry->len -= TARGET_PAGE_SIZE;
> +            entry->offset += TARGET_PAGE_SIZE;
> +        } else {
> +            memory_region_unref(result->mr);

Here it is the unref, but I still don't understand why we don't need to
undo that on the error case on previous patch.

> +            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> +            g_free(entry);
> +        }
> +    }
> +    qemu_mutex_unlock(&ms->src_page_req_mutex);
> +
> +    return result;
> +}
> +
> +
>  /**
>   * Queue the pages for transmission, e.g. a request from postcopy destination
>   *   ms: MigrationStatus in which the queue is held
> @@ -987,6 +1032,58 @@ err:
>  

> @@ -997,65 +1094,102 @@ err:
>   * @f: QEMUFile where to send the data
>   * @last_stage: if we are at the completion stage
>   * @bytes_transferred: increase it with the number of transferred bytes
> + *
> + * On systems where host-page-size > target-page-size it will send all the
> + * pages in a host page that are dirty.
>   */
>  
>  static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
>                                     uint64_t *bytes_transferred)
>  {
> +    MigrationState *ms = migrate_get_current();
>      RAMBlock *block = last_seen_block;
> +    RAMBlock *tmpblock;
>      ram_addr_t offset = last_offset;
> +    ram_addr_t tmpoffset;
>      bool complete_round = false;
>      int pages = 0;
> -    MemoryRegion *mr;
>      ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
>                                   ram_addr_t space */
>  
> -    if (!block)
> +    if (!block) {
>          block = QLIST_FIRST_RCU(&ram_list.blocks);
> +        last_was_from_queue = false;
> +    }
>  
> -    while (true) {
> -        mr = block->mr;
> -        offset = migration_bitmap_find_and_reset_dirty(mr, offset,
> -                                                       &dirty_ram_abs);
> -        if (complete_round && block == last_seen_block &&
> -            offset >= last_offset) {
> -            break;
> -        }
> -        if (offset >= block->used_length) {
> -            offset = 0;
> -            block = QLIST_NEXT_RCU(block, next);
> -            if (!block) {
> -                block = QLIST_FIRST_RCU(&ram_list.blocks);
> -                complete_round = true;
> -                ram_bulk_stage = false;
> -                if (migrate_use_xbzrle()) {
> -                    /* If xbzrle is on, stop using the data compression at this
> -                     * point. In theory, xbzrle can do better than compression.
> -                     */
> -                    flush_compressed_data(f);
> -                    compression_switch = false;
> -                }
> +    while (true) { /* Until we send a block or run out of stuff to send */
> +        tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &dirty_ram_abs);

This function was ugly.  You already split it in the past.  This patch
makes it even more complicated.  Can we try something like add a

ram_find_next_page() and try to put some of the code inside the while
there?

Once here, can we agree to send the next N pages (if they are contiguos)
if we receive a queued request?  Yeap, deciding N means testing and measuring.
And can wait for this to be integrated.

> +
> +        if (tmpblock) {
> +            /* We've got a block from the postcopy queue */
> +            trace_ram_find_and_save_block_postcopy(tmpblock->idstr,
> +                                                   (uint64_t)tmpoffset,
> +                                                   (uint64_t)dirty_ram_abs);
> +            /*
> +             * We're sending this page, and since it's postcopy nothing else
> +             * will dirty it, and we must make sure it doesn't get sent again
> +             * even if this queue request was received after the background
> +             * search already sent it.
> +             */
> +            if (!test_bit(dirty_ram_abs >> TARGET_PAGE_BITS,
> +                          migration_bitmap)) {

I think this test can be inside ram_save_unqueue_page()

I.e. rename to:

ram_save_get_next_queued_page()

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
@ 2015-07-14 10:05   ` Juan Quintela
  2015-07-27  6:11     ` Amit Shah
  2015-09-23 16:45     ` Dr. David Alan Gilbert
  2015-07-27  6:11   ` Amit Shah
  1 sibling, 2 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 10:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> postcopy_place_page (etc) provide a way for postcopy to place a page
> into guests memory atomically (using the copy ioctl on the ufd).
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> --- a/include/migration/postcopy-ram.h
> +++ b/include/migration/postcopy-ram.h
> @@ -69,4 +69,20 @@ void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
>  void postcopy_discard_send_finish(MigrationState *ms,
>                                    PostcopyDiscardState *pds);
>  
> +/*
> + * Place a page (from) at (host) efficiently
> + *    There are restrictions on how 'from' must be mapped, in general best
> + *    to use other postcopy_ routines to allocate.
> + * returns 0 on success
> + */
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +                        bool all_zero);
> +
> +/*
> + * Allocate a page of memory that can be mapped at a later point in time
> + * using postcopy_place_page
> + * Returns: Pointer to allocated page
> + */
> +void *postcopy_get_tmp_page(MigrationIncomingState *mis);
> +

I don't think that this makes sense, but wouldn't have been a good idea
to ask for the address that we want as a hint.  That could help with
fragmentation, no?

> +/*
> + * Place a host page (from) at (host) atomically
> + * all_zero: Hint that the page being placed is 0 throughout
> + * returns 0 on success
> + */
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +                        bool all_zero)

postcop_place_page() and postcop_place_zero_page()?  They just share a
trace point :p


> +{
> +    if (!all_zero) {
> +        struct uffdio_copy copy_struct;
> +
> +        copy_struct.dst = (uint64_t)(uintptr_t)host;
> +        copy_struct.src = (uint64_t)(uintptr_t)from;
> +        copy_struct.len = getpagesize();
> +        copy_struct.mode = 0;
> +
> +        /* copy also acks to the kernel waking the stalled thread up
> +         * TODO: We can inhibit that ack and only do it if it was requested
> +         * which would be slightly cheaper, but we'd have to be careful
> +         * of the order of updating our page state.
> +         */
> +        if (ioctl(mis->userfault_fd, UFFDIO_COPY, &copy_struct)) {
> +            int e = errno;
> +            error_report("%s: %s copy host: %p from: %p",
> +                         __func__, strerror(e), host, from);
> +
> +            return -e;
> +        }
> +    } else {
> +        struct uffdio_zeropage zero_struct;
> +
> +        zero_struct.range.start = (uint64_t)(uintptr_t)host;
> +        zero_struct.range.len = getpagesize();
> +        zero_struct.mode = 0;
> +
> +        if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
> +            int e = errno;
> +            error_report("%s: %s zero host: %p from: %p",
> +                         __func__, strerror(e), host, from);
> +
> +            return -e;
> +        }
> +    }
> +
> +    trace_postcopy_place_page(host, all_zero);
> +    return 0;
> +}

I really think that the userfault code should be in a linux specific
file, but that can be done late, so I will not insist O:-)

Later, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
@ 2015-07-14 12:34   ` Juan Quintela
  2015-07-17 17:31     ` Dr. David Alan Gilbert
  2015-07-27  7:39   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 12:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> In postcopy, the destination guest is running at the same time
> as it's receiving pages; as we receive new pages we must put
> them into the guests address space atomically to avoid a running
> CPU accessing a partially written page.
>
> Use the helpers in postcopy-ram.c to map these pages.
>
> qemu_get_buffer_less_copy is used to avoid a copy out of qemu_file
> in the case that postcopy is going to do a copy anyway.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> @@ -1742,7 +1752,6 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>              error_report("Ack, bad migration stream!");
>              return NULL;
>          }
> -

Dont' belong here O:-)

>          return memory_region_get_ram_ptr(block->mr) + offset;
>      }
>  
> @@ -1881,6 +1890,16 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      int flags = 0, ret = 0;
>      static uint64_t seq_iter;
>      int len = 0;
> +    /*
> +     * System is running in postcopy mode, page inserts to host memory must be
> +     * atomic
> +     */
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    bool postcopy_running = postcopy_state_get(mis) >=
> +                            POSTCOPY_INCOMING_LISTENING;
> +    void *postcopy_host_page = NULL;
> +    bool postcopy_place_needed = false;
> +    bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
>  
>      seq_iter++;
>  
> @@ -1896,13 +1915,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      rcu_read_lock();
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr, total_ram_bytes;
> -        void *host;
> +        void *host = 0;
> +        void *page_buffer = 0;
> +        void *postcopy_place_source = 0;

NULL, NULL, NULL?

BTW, do we really need postcopy_place_source?  I think that just doing
s/postcopy_place_source/postcopy_host_page/ would do?

>          uint8_t ch;
> +        bool all_zero = false;
>  
>          addr = qemu_get_be64(f);
>          flags = addr & ~TARGET_PAGE_MASK;
>          addr &= TARGET_PAGE_MASK;
>  
> +        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> +                     RAM_SAVE_FLAG_XBZRLE)) {
> +            host = host_from_stream_offset(f, mis, addr, flags);
> +            if (!host) {
> +                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> +                ret = -EINVAL;
> +                break;
> +            }
> +            if (!postcopy_running) {
> +                page_buffer = host;
> +            } else {
> +                /*
> +                 * Postcopy requires that we place whole host pages atomically.
> +                 * To make it atomic, the data is read into a temporary page
> +                 * that's moved into place later.
> +                 * The migration protocol uses,  possibly smaller, target-pages
> +                 * however the source ensures it always sends all the components
> +                 * of a host page in order.
> +                 */
> +                if (!postcopy_host_page) {
> +                    postcopy_host_page = postcopy_get_tmp_page(mis);
> +                }
> +                page_buffer = postcopy_host_page +
> +                              ((uintptr_t)host & ~qemu_host_page_mask);
> +                /* If all TP are zero then we can optimise the place */
> +                if (!((uintptr_t)host & ~qemu_host_page_mask)) {

I don't understand the test, the comment or both :-(

How you arrive from that test that this is a page full of zeros is a
mistery to me :p

Head hurts, would try to convince myself that the rest of changes are ok.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
@ 2015-07-14 12:36   ` Juan Quintela
  2015-07-14 13:13     ` Dr. David Alan Gilbert
  2015-07-27  7:43   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 12:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Once we're in postcopy the source processors are stopped and memory
> shouldn't change any more, so there's no need to look at the dirty
> map.
>
> There are two notes to this:
>   1) If we do resync and a page had changed then the page would get
>      sent again, which the destination wouldn't allow (since it might
>      have also modified the page)
>   2) Before disabling this I'd seen very rare cases where a page had been
>      marked dirtied although the memory contents are apparently identical
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Juan Quintela <quintela@redhat.com>

But, in what patch do we sync the migratioon bitmap after changing to postcopy?

> ---
>  migration/ram.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 01a0ab4..5cff4d6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>  {
>      rcu_read_lock();
>  
> -    migration_bitmap_sync();
> +    if (!migration_postcopy_phase(migrate_get_current())) {
> +        migration_bitmap_sync();
> +    }
>  
>      ram_control_before_iterate(f, RAM_CONTROL_FINISH);
>  
> @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
>  
>      remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
>  
> -    if (remaining_size < max_size) {
> +    if (!migration_postcopy_phase(migrate_get_current()) &&
> +        remaining_size < max_size) {
>          qemu_mutex_lock_iothread();
>          rcu_read_lock();
>          migration_bitmap_sync();

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy
  2015-07-14 12:36   ` Juan Quintela
@ 2015-07-14 13:13     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-14 13:13 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, amit.shah, pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Once we're in postcopy the source processors are stopped and memory
> > shouldn't change any more, so there's no need to look at the dirty
> > map.
> >
> > There are two notes to this:
> >   1) If we do resync and a page had changed then the page would get
> >      sent again, which the destination wouldn't allow (since it might
> >      have also modified the page)
> >   2) Before disabling this I'd seen very rare cases where a page had been
> >      marked dirtied although the memory contents are apparently identical
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> But, in what patch do we sync the migratioon bitmap after changing to postcopy?

It's called one last time in ram_postcopy_send_discard_bitmap; see:
v7-0025-Postcopy-Maintain-sentmap-and-calculate-discard.patch

and that happens at the start of postcopy mode, when the CPU is stopped and won't
be running on the source again.

Dave

> 
> > ---
> >  migration/ram.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 01a0ab4..5cff4d6 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> >  {
> >      rcu_read_lock();
> >  
> > -    migration_bitmap_sync();
> > +    if (!migration_postcopy_phase(migrate_get_current())) {
> > +        migration_bitmap_sync();
> > +    }
> >  
> >      ram_control_before_iterate(f, RAM_CONTROL_FINISH);
> >  
> > @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> >  
> >      remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
> >  
> > -    if (remaining_size < max_size) {
> > +    if (!migration_postcopy_phase(migrate_get_current()) &&
> > +        remaining_size < max_size) {
> >          qemu_mutex_lock_iothread();
> >          rcu_read_lock();
> >          migration_bitmap_sync();
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 36/42] Host page!=target page: Cleanup bitmaps
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 36/42] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
@ 2015-07-14 15:01   ` Juan Quintela
  2015-07-31 15:53     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 15:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Prior to the start of postcopy, ensure that everything that will
> be transferred later is a whole host-page in size.
>
> This is accomplished by discarding partially transferred host pages
> and marking any that are partially dirty as fully dirty.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>  /*
> + * Helper for postcopy_chunk_hostpages where HPS/TPS >= bits-in-long
> + *
> + * !! Untested !!

You continue in the race for best comment ever O:-)


> + */
> +static int hostpage_big_chunk_helper(const char *block_name, void *host_addr,
> +                                     ram_addr_t offset, ram_addr_t length,
> +                                     void *opaque)
> +{
> +    MigrationState *ms = opaque;
> +    unsigned long long_bits = sizeof(long) * 8;
> +    unsigned int host_len = (qemu_host_page_size / TARGET_PAGE_SIZE) /
> +                            long_bits;
> +    unsigned long first_long, last_long, cur_long, current_hp;
> +    unsigned long first = offset >> TARGET_PAGE_BITS;
> +    unsigned long last = (offset + (length - 1)) >> TARGET_PAGE_BITS;
> +
> +    PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
> +                                                           first,
> +                                                           block_name);

Minor

PostcopyDiscardState *pds =
                     postcopy_discard_send_init(ms, first, block_name);

??
> +    first_long = first / long_bits;
> +    last_long = last / long_bits;
> +
> +    /*
> +     * I'm assuming RAMBlocks must start at the start of host pages,
> +     * but I guess they might not use the whole of the host page
> +     */
> +
> +    /* Work along one host page at a time */
> +    for (current_hp = first_long; current_hp <= last_long;
> +         current_hp += host_len) {
> +        bool discard = 0;
> +        bool redirty = 0;
> +        bool has_some_dirty = false;
> +        bool has_some_undirty = false;
> +        bool has_some_sent = false;
> +        bool has_some_unsent = false;
> +
> +        /*
> +         * Check each long of mask for this hp, and see if anything
> +         * needs updating.
> +         */
> +        for (cur_long = current_hp; cur_long < (current_hp + host_len);
> +             cur_long++) {
> +            /* a chunk of sent pages */
> +            unsigned long sdata = ms->sentmap[cur_long];
> +            /* a chunk of dirty pages */
> +            unsigned long ddata = migration_bitmap[cur_long];
> +
> +            if (sdata) {
> +                has_some_sent = true;
> +            }
> +            if (sdata != ~0ul) {
> +                has_some_unsent = true;
> +            }
> +            if (ddata) {
> +                has_some_dirty = true;
> +            }
> +            if (ddata != ~0ul) {
> +                has_some_undirty = true;
> +            }
> +
> +        }

No need for this:

find_first_bit()
find_first_zero_bit()

You are warking all the words when a single search is enough?


> +
> +        if (has_some_sent && has_some_unsent) {
> +            /* Partially sent host page */
> +            discard = true;
> +            redirty = true;
> +        }
> +
> +        if (has_some_dirty && has_some_undirty) {
> +            /* Partially dirty host page */
> +            redirty = true;
> +        }
> +
> +        if (!discard && !redirty) {
> +            /* All consistent - next host page */
> +            continue;
> +        }
> +
> +
> +        /* Now walk the chunks again, sending discards etc */
> +        for (cur_long = current_hp; cur_long < (current_hp + host_len);
> +             cur_long++) {
> +            unsigned long cur_bits = cur_long * long_bits;
> +
> +            /* a chunk of sent pages */
> +            unsigned long sdata = ms->sentmap[cur_long];
> +            /* a chunk of dirty pages */
> +            unsigned long ddata = migration_bitmap[cur_long];
> +
> +            if (discard && sdata) {
> +                /* Tell the destination to discard these pages */
> +                postcopy_discard_send_range(ms, pds, cur_bits,
> +                                            cur_bits + long_bits - 1);
> +                /* And clear them in the sent data structure */
> +                ms->sentmap[cur_long] = 0;
> +            }
> +
> +            if (redirty) {
> +                migration_bitmap[cur_long] = ~0ul;
> +                /* Inc the count of dirty pages */
> +                migration_dirty_pages += ctpopl(~ddata);
> +            }
> +        }

creative use of bitmap_zero(), bitmap_fill() and just doing o whelo
postcopy_discard_send_rand() would not be better?



> +    }
> +
> +    postcopy_discard_send_finish(ms, pds);
> +
> +    return 0;
> +}
> +
> +/*
> + * When working on long chunks of a bitmap where the only valid section
> + * is between start..end (inclusive), generate a mask with only those
> + * valid bits set for the current long word within that bitmask.
> + */
> +static unsigned long make_long_mask(unsigned long start, unsigned long end,
> +                                    unsigned long cur_long)
> +{
> +    unsigned long long_bits = sizeof(long) * 8;
> +    unsigned long long_bits_mask = long_bits - 1;
> +    unsigned long first_long, last_long;
> +    unsigned long mask = ~(unsigned long)0;
> +    first_long = start / long_bits ;
> +    last_long = end / long_bits;
> +
> +    if ((cur_long == first_long) && (start & long_bits_mask)) {
> +        /* e.g. (start & 31) = 3
> +         *         1 << .    -> 2^3
> +         *         . - 1     -> 2^3 - 1 i.e. mask 2..0
> +         *         ~.        -> mask 31..3
> +         */
> +        mask &= ~((((unsigned long)1) << (start & long_bits_mask)) - 1);

           start = start & long_bit_mask;
           bitmap_set(&mask, start, long_bits - start);

> +    }
> +
> +    if ((cur_long == last_long) && ((end & long_bits_mask) != long_bits_mask)) {
> +        /* e.g. (end & 31) = 3
> +         *            .   +1 -> 4
> +         *         1 << .    -> 2^4
> +         *         . -1      -> 2^4 - 1
> +         *                   = mask set 3..0
> +         */
> +        mask &= (((unsigned long)1) << ((end & long_bits_mask) + 1)) - 1;

           bitmap_set(&mask, 0, end);


Adjust +1/-1 depending on how you do limits?

BTW, when I need inspiration about how to code functions that deal with
bits,  I searc for inspiration in bitmap.c.  Sometimes function already
exist, and otherwise, things like BITS_PER_LONG, etc, are already
defined there.

> +    }
> +
> +    return mask;
> +}
> +
> +/*
> + * Utility for the outgoing postcopy code.
> + *
> + * Discard any partially sent host-page size chunks, mark any partially
> + * dirty host-page size chunks as all dirty.
> + *
> + * Returns: 0 on success
> + */
> +static int postcopy_chunk_hostpages(MigrationState *ms)
> +{
> +    struct RAMBlock *block;
> +    unsigned int host_bits = qemu_host_page_size / TARGET_PAGE_SIZE;
> +    unsigned long long_bits = sizeof(long) * 8;
> +    unsigned long host_mask;
> +
> +    assert(is_power_of_2(host_bits));
> +
> +    if (qemu_host_page_size == TARGET_PAGE_SIZE) {
> +        /* Easy case - TPS==HPS - nothing to be done */
> +        return 0;
> +    }
> +
> +    /* Easiest way to make sure we don't resume in the middle of a host-page */
> +    last_seen_block = NULL;
> +    last_sent_block = NULL;

Best names ever.  And you have to blame me at least for the second one
to appear :p


> +
> +    /*
> +     * The currently worst known ratio is ARM that has 1kB target pages, and
> +     * can have 64kB host pages, which is thus inconveniently larger than a long
> +     * on ARM (32bits), and a long is the underlying element of the migration
> +     * bitmaps.
> +     */
> +    if (host_bits >= long_bits) {
> +        /* Deal with the odd case separately */
> +        return qemu_ram_foreach_block(hostpage_big_chunk_helper, ms);
> +    } else {
> +        host_mask =  (1ul << host_bits) - 1;
> +    }

You can remove the else enterily and just put the code at top level.

So, we have three cases:

- host_bits == target_bits -> NOP
- host_bits >= long_bits
- host_bits < long_bits

Couldn't we merge the last two?  they are very similar, and having two
code paths looks too much to me?

> @@ -1405,9 +1664,17 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
>      int ret;
>  
>      rcu_read_lock();
> +
Another not needed.

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
@ 2015-07-14 15:10   ` Juan Quintela
  2015-07-14 15:15     ` Dr. David Alan Gilbert
  2015-07-27 14:29   ` Amit Shah
  1 sibling, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 15:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> userfaultfd is a Linux syscall that gives an fd that receives a stream
> of notifications of accesses to pages registered with it and allows
> the program to acknowledge those stalls and tell the accessing
> thread to carry on.
>
> We convert the requests from the kernel into messages back to the
> source asking for the pages.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> @@ -274,15 +276,41 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
>   */
>  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>  {
> -    /* TODO: Join the fault thread once we're sure it will exit */
> -    if (qemu_ram_foreach_block(cleanup_area, mis)) {
> -        return -1;
> +    trace_postcopy_ram_incoming_cleanup_entry();
> +
> +    if (mis->have_fault_thread) {
> +        uint64_t tmp64;
> +
> +        if (qemu_ram_foreach_block(cleanup_area, mis)) {
> +            return -1;
> +        }
> +        /*
> +         * Tell the fault_thread to exit, it's an eventfd that should
> +         * currently be at 0, we're going to inc it to 1
> +         */
> +        tmp64 = 1;
> +        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
> +            trace_postcopy_ram_incoming_cleanup_join();
> +            qemu_thread_join(&mis->fault_thread);
> +        } else {
> +            /* Not much we can do here, but may as well report it */
> +            error_report("%s: incing userfault_quit_fd: %s", __func__,
> +                         strerror(errno));

"incing"???

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 38/42] Start up a postcopy/listener thread ready for incoming page data
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 38/42] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
@ 2015-07-14 15:12   ` Juan Quintela
  0 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 15:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> The loading of a device state (during postcopy) may access guest
> memory that's still on the source machine and thus might need
> a page fill; split off a separate thread that handles the incoming
> page data so that the original incoming migration code can finish
> off the device data.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 39/42] postcopy: Wire up loadvm_postcopy_handle_ commands
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 39/42] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
@ 2015-07-14 15:14   ` Juan Quintela
  2015-07-28  5:53   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 15:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Wire up more of the handlers for the commands on the destination side,
> in particular loadvm_postcopy_handle_run now has enough to start the
> guest running.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

As said before, I don't like the nested protocol handling.  But I have
no better suggestions at this time :p

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests
  2015-07-14 15:10   ` Juan Quintela
@ 2015-07-14 15:15     ` Dr. David Alan Gilbert
  2015-07-14 15:25       ` Juan Quintela
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-14 15:15 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > userfaultfd is a Linux syscall that gives an fd that receives a stream
> > of notifications of accesses to pages registered with it and allows
> > the program to acknowledge those stalls and tell the accessing
> > thread to carry on.
> >
> > We convert the requests from the kernel into messages back to the
> > source asking for the pages.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> 
> > @@ -274,15 +276,41 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
> >   */
> >  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> >  {
> > -    /* TODO: Join the fault thread once we're sure it will exit */
> > -    if (qemu_ram_foreach_block(cleanup_area, mis)) {
> > -        return -1;
> > +    trace_postcopy_ram_incoming_cleanup_entry();
> > +
> > +    if (mis->have_fault_thread) {
> > +        uint64_t tmp64;
> > +
> > +        if (qemu_ram_foreach_block(cleanup_area, mis)) {
> > +            return -1;
> > +        }
> > +        /*
> > +         * Tell the fault_thread to exit, it's an eventfd that should
> > +         * currently be at 0, we're going to inc it to 1
> > +         */
> > +        tmp64 = 1;
> > +        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
> > +            trace_postcopy_ram_incoming_cleanup_join();
> > +            qemu_thread_join(&mis->fault_thread);
> > +        } else {
> > +            /* Not much we can do here, but may as well report it */
> > +            error_report("%s: incing userfault_quit_fd: %s", __func__,
> > +                         strerror(errno));
> 
> "incing"???

Oh, incrementing :-)
Changed.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 40/42] End of migration for postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 40/42] End of migration for postcopy Dr. David Alan Gilbert (git)
@ 2015-07-14 15:15   ` Juan Quintela
  2015-07-28  5:55   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 15:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Tweak the end of migration cleanup; we don't want to close stuff down
> at the end of the main stream, since the postcopy is still sending pages
> on the other thread.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
@ 2015-07-14 15:22   ` Juan Quintela
  2015-07-28  6:02     ` Amit Shah
  2015-09-24 10:36     ` Dr. David Alan Gilbert
  2015-07-28  6:02   ` Amit Shah
  1 sibling, 2 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 15:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Userfault doesn't work with mlock; mlock is designed to nail down pages
> so they don't move, userfault is designed to tell you when they're not
> there.
>
> munlock the pages we userfault protect before postcopy.
> mlock everything again at the end if mlock is enabled.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  include/sysemu/sysemu.h  |  1 +
>  migration/postcopy-ram.c | 24 ++++++++++++++++++++++++
>  2 files changed, 25 insertions(+)
>
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 1af2ea0..c1f3da4 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -171,6 +171,7 @@ extern int boot_menu;
>  extern bool boot_strict;
>  extern uint8_t *boot_splash_filedata;
>  extern size_t boot_splash_filedata_size;
> +extern bool enable_mlock;
>  extern uint8_t qemu_extra_params_fw[2];
>  extern QEMUClockType rtc_clock;
>  extern const char *mem_path;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 7eb1fb9..be7e5f2 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -85,6 +85,11 @@ static bool ufd_version_check(int ufd)
>      return true;
>  }
>  
> +/*
> + * Note: This has the side effect of munlock'ing all of RAM, that's
> + * normally fine since if the postcopy succeeds it gets turned back on at the
> + * end.
> + */
>  bool postcopy_ram_supported_by_host(void)
>  {
>      long pagesize = getpagesize();
> @@ -113,6 +118,15 @@ bool postcopy_ram_supported_by_host(void)
>      }
>  
>      /*
> +     * userfault and mlock don't go together; we'll put it back later if
> +     * it was enabled.
> +     */
> +    if (munlockall()) {
> +        error_report("%s: munlockall: %s", __func__,  strerror(errno));


why is this not proteced by enable_mlock?

> +        return -1;
> +    }
> +
> +    /*
>       *  We need to check that the ops we need are supported on anon memory
>       *  To do that we need to register a chunk and see the flags that
>       *  are returned.
> @@ -303,6 +317,16 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>          mis->have_fault_thread = false;
>      }
>  
> +    if (enable_mlock) {
> +        if (os_mlock() < 0) {
> +            error_report("mlock: %s", strerror(errno));
> +            /*
> +             * It doesn't feel right to fail at this point, we have a valid
> +             * VM state.
> +             */

realtime_init() exit in case of os_mlock() fails, so current code is:

- we start qemu with mlock requset
- we mlock memory
- we start postcopy
- we munlock memory
- we mlock memory

I wmill really, really preffer having a check if memory is mlocked, and
it that case, just abort migration altogether.  Or better still, wait to
enable mlock *until* we have finished postcopy, no?

Later, Juan.

> +        }
> +    }
> +
>      postcopy_state_set(mis, POSTCOPY_INCOMING_END);
>      migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
@ 2015-07-14 15:24   ` Juan Quintela
  2015-07-28  6:15   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 15:24 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Postcopy detects accesses to pages that haven't been transferred yet
> using userfaultfd, and it causes exceptions on pages that are 'not
> present'.
> Ballooning also causes pages to be marked as 'not present' when the
> guest inflates the balloon.
> Potentially a balloon could be inflated to discard pages that are
> currently inflight during postcopy and that may be arriving at about
> the same time.
>
> To avoid this confusion, disable ballooning during postcopy.
>
> When disabled we drop balloon requests from the guest.  Since ballooning
> is generally initiated by the host, the management system should avoid
> initiating any balloon instructions to the guest during migration,
> although it's not possible to know how long it would take a guest to
> process a request made prior to the start of migration.
>
> Queueing the requests until after migration would be nice, but is
> non-trivial, since the set of inflate/deflate requests have to
> be compared with the state of the page to know what the final
> outcome is allowed to be.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests
  2015-07-14 15:15     ` Dr. David Alan Gilbert
@ 2015-07-14 15:25       ` Juan Quintela
  0 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-07-14 15:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> >
>> > userfaultfd is a Linux syscall that gives an fd that receives a stream
>> > of notifications of accesses to pages registered with it and allows
>> > the program to acknowledge those stalls and tell the accessing
>> > thread to carry on.
>> >
>> > We convert the requests from the kernel into messages back to the
>> > source asking for the pages.
>> >
>> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> 
>> 
>> > @@ -274,15 +276,41 @@ int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
>> >   */
>> >  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>> >  {
>> > -    /* TODO: Join the fault thread once we're sure it will exit */
>> > -    if (qemu_ram_foreach_block(cleanup_area, mis)) {
>> > -        return -1;
>> > +    trace_postcopy_ram_incoming_cleanup_entry();
>> > +
>> > +    if (mis->have_fault_thread) {
>> > +        uint64_t tmp64;
>> > +
>> > +        if (qemu_ram_foreach_block(cleanup_area, mis)) {
>> > +            return -1;
>> > +        }
>> > +        /*
>> > +         * Tell the fault_thread to exit, it's an eventfd that should
>> > +         * currently be at 0, we're going to inc it to 1
>> > +         */
>> > +        tmp64 = 1;
>> > +        if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
>> > +            trace_postcopy_ram_incoming_cleanup_join();
>> > +            qemu_thread_join(&mis->fault_thread);
>> > +        } else {
>> > +            /* Not much we can do here, but may as well report it */
>> > +            error_report("%s: incing userfault_quit_fd: %s", __func__,
>> > +                         strerror(errno));
>> 
>> "incing"???
>
> Oh, incrementing :-)
> Changed.
>
> Dave
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
  2015-06-17 16:30   ` Juan Quintela
@ 2015-07-15  7:31   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-15  7:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:27], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add migrate_send_rp_message to send a message from destination to source along the return path.
>   (It uses a mutex to let it be called from multiple threads)
> Add migrate_send_rp_shut to send a 'shut' message to indicate
>   the destination is finished with the RP.
> Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
>   Use it in the MSG_RP_PING handler
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path Dr. David Alan Gilbert (git)
  2015-07-13 10:29   ` Juan Quintela
@ 2015-07-15  7:50   ` Amit Shah
  2015-07-16 11:32     ` Dr. David Alan Gilbert
  2015-08-05  8:06   ` zhanghailiang
  2 siblings, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-15  7:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:28], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Open a return path, and handle messages that are received upon it.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> -/* migration thread support */
> +/*
> + * Something bad happened to the RP stream, mark an error
> + * The caller shall print something to indicate why
> + */
> +static void source_return_path_bad(MigrationState *s)

Can you rename this to something like

mark_source_rp_bad()

?

Intent is clearer that way.

Also, the comment says caller will print something, but the
invocations below are a mix of printfs and traces.  Not saying the
caller has to print always, but maybe only comment needs update.

> +{
> +    s->rp_state.error = true;
> +    migrate_fd_cleanup_src_rp(s);
> +}
> +
> +/*
> + * Handles messages sent on the return path towards the source VM
> + *
> + */
> +static void *source_return_path_thread(void *opaque)
> +{
> +    MigrationState *ms = opaque;
> +    QEMUFile *rp = ms->rp_state.file;
> +    uint16_t expected_len, header_len, header_type;
> +    const int max_len = 512;
> +    uint8_t buf[max_len];
> +    uint32_t tmp32;
> +    int res;
> +
> +    trace_source_return_path_thread_entry();
> +    while (rp && !qemu_file_get_error(rp) &&
> +        migration_already_active(ms)) {
> +        trace_source_return_path_thread_loop_top();
> +        header_type = qemu_get_be16(rp);
> +        header_len = qemu_get_be16(rp);
> +
> +        switch (header_type) {
> +        case MIG_RP_MSG_SHUT:
> +        case MIG_RP_MSG_PONG:
> +            expected_len = 4;
> +            break;
> +
> +        default:
> +            error_report("RP: Received invalid message 0x%04x length 0x%04x",
> +                    header_type, header_len);
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
>  
> +        if (header_len > expected_len) {
> +            error_report("RP: Received message 0x%04x with"
> +                    "incorrect length %d expecting %d",
> +                    header_type, header_len,
> +                    expected_len);
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
> +
> +        /* We know we've got a valid header by this point */
> +        res = qemu_get_buffer(rp, buf, header_len);
> +        if (res != header_len) {
> +            trace_source_return_path_thread_failed_read_cmd_data();
> +            source_return_path_bad(ms);
> +            goto out;
> +        }

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 16/42] Rework loadvm path for subloops
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 16/42] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
  2015-07-13 10:33   ` Juan Quintela
@ 2015-07-15  9:34   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-15  9:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:29], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy needs to have two migration streams loading concurrently;
> one from memory (with the device state) and the other from the fd
> with the memory transactions.
> 
> Split the core of qemu_loadvm_state out so we can use it for both.
> 
> Allow the inner loadvm loop to quit and cause the parent loops to
> exit as well.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram.
  2015-06-16 15:58     ` Dr. David Alan Gilbert
@ 2015-07-15  9:39       ` Amit Shah
  0 siblings, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-15  9:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [16:58:17], Dr. David Alan Gilbert wrote:
> * Eric Blake (eblake@redhat.com) wrote:
> > On 06/16/2015 04:26 AM, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > The 'postcopy ram' capability allows postcopy migration of RAM;
> > > note that the migration starts off in precopy mode until
> > > postcopy mode is triggered (see the migrate_start_postcopy
> > > patch later in the series).
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  include/migration/migration.h |  1 +
> > >  migration/migration.c         | 23 +++++++++++++++++++++++
> > >  qapi-schema.json              |  6 +++++-
> > >  3 files changed, 29 insertions(+), 1 deletion(-)
> > > 
> > 
> > > +++ b/qapi-schema.json
> > > @@ -526,11 +526,15 @@
> > >  # @auto-converge: If enabled, QEMU will automatically throttle down the guest
> > >  #          to speed up convergence of RAM migration. (since 1.6)
> > >  #
> > > +# @x-postcopy-ram: Start executing on the migration target before all of RAM has
> > > +#          been migrated, pulling the remaining pages along as needed. NOTE: If
> > > +#          the migration fails during postcopy the VM will fail.  (since 2.4)
> > 
> > Marking it experimental because it might change?  Or is the interface
> > pretty stable, but you want more testing time to minimize bugs?
> 
> It's easy enough to remove the x-  once we're all happy;  it seems pretty
> stable at the moment but when we're done I'll just submit a one liner to take the x-
> off.

We shouldn't mark it stable till we have a released kernel (which
freezes the kernel API for us).

We could pick this patchset without a released kernel, but this will
have to remain x- if that happens.

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
  2015-06-16 15:43   ` Eric Blake
  2015-07-13 10:35   ` Juan Quintela
@ 2015-07-15  9:40   ` Amit Shah
  2 siblings, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-15  9:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:30], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The 'postcopy ram' capability allows postcopy migration of RAM;
> note that the migration starts off in precopy mode until
> postcopy mode is triggered (see the migrate_start_postcopy
> patch later in the series).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
  2015-07-15  7:50   ` Amit Shah
@ 2015-07-16 11:32     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-16 11:32 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:28], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Open a return path, and handle messages that are received upon it.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> 
> > -/* migration thread support */
> > +/*
> > + * Something bad happened to the RP stream, mark an error
> > + * The caller shall print something to indicate why
> > + */
> > +static void source_return_path_bad(MigrationState *s)
> 
> Can you rename this to something like
> 
> mark_source_rp_bad()
> 
> ?
> 
> Intent is clearer that way.

Done.

> Also, the comment says caller will print something, but the
> invocations below are a mix of printfs and traces.  Not saying the
> caller has to print always, but maybe only comment needs update.

Yes, I've changed the comment, and changed one of the traces into
an error_report.

Thanks,

Dave

> 
> > +{
> > +    s->rp_state.error = true;
> > +    migrate_fd_cleanup_src_rp(s);
> > +}
> > +
> > +/*
> > + * Handles messages sent on the return path towards the source VM
> > + *
> > + */
> > +static void *source_return_path_thread(void *opaque)
> > +{
> > +    MigrationState *ms = opaque;
> > +    QEMUFile *rp = ms->rp_state.file;
> > +    uint16_t expected_len, header_len, header_type;
> > +    const int max_len = 512;
> > +    uint8_t buf[max_len];
> > +    uint32_t tmp32;
> > +    int res;
> > +
> > +    trace_source_return_path_thread_entry();
> > +    while (rp && !qemu_file_get_error(rp) &&
> > +        migration_already_active(ms)) {
> > +        trace_source_return_path_thread_loop_top();
> > +        header_type = qemu_get_be16(rp);
> > +        header_len = qemu_get_be16(rp);
> > +
> > +        switch (header_type) {
> > +        case MIG_RP_MSG_SHUT:
> > +        case MIG_RP_MSG_PONG:
> > +            expected_len = 4;
> > +            break;
> > +
> > +        default:
> > +            error_report("RP: Received invalid message 0x%04x length 0x%04x",
> > +                    header_type, header_len);
> > +            source_return_path_bad(ms);
> > +            goto out;
> > +        }
> >  
> > +        if (header_len > expected_len) {
> > +            error_report("RP: Received message 0x%04x with"
> > +                    "incorrect length %d expecting %d",
> > +                    header_type, header_len,
> > +                    expected_len);
> > +            source_return_path_bad(ms);
> > +            goto out;
> > +        }
> > +
> > +        /* We know we've got a valid header by this point */
> > +        res = qemu_get_buffer(rp, buf, header_len);
> > +        if (res != header_len) {
> > +            trace_source_return_path_thread_failed_read_cmd_data();
> > +            source_return_path_bad(ms);
> > +            goto out;
> > +        }
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration
  2015-07-14 12:34   ` Juan Quintela
@ 2015-07-17 17:31     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-17 17:31 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, amit.shah, pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > In postcopy, the destination guest is running at the same time
> > as it's receiving pages; as we receive new pages we must put
> > them into the guests address space atomically to avoid a running
> > CPU accessing a partially written page.
> >
> > Use the helpers in postcopy-ram.c to map these pages.
> >
> > qemu_get_buffer_less_copy is used to avoid a copy out of qemu_file
> > in the case that postcopy is going to do a copy anyway.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > @@ -1742,7 +1752,6 @@ static inline void *host_from_stream_offset(QEMUFile *f,
> >              error_report("Ack, bad migration stream!");
> >              return NULL;
> >          }
> > -
> 
> Dont' belong here O:-)

Oops, gone.

> >          return memory_region_get_ram_ptr(block->mr) + offset;
> >      }
> >  
> > @@ -1881,6 +1890,16 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      int flags = 0, ret = 0;
> >      static uint64_t seq_iter;
> >      int len = 0;
> > +    /*
> > +     * System is running in postcopy mode, page inserts to host memory must be
> > +     * atomic
> > +     */
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    bool postcopy_running = postcopy_state_get(mis) >=
> > +                            POSTCOPY_INCOMING_LISTENING;
> > +    void *postcopy_host_page = NULL;
> > +    bool postcopy_place_needed = false;
> > +    bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
> >  
> >      seq_iter++;
> >  
> > @@ -1896,13 +1915,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      rcu_read_lock();
> >      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> >          ram_addr_t addr, total_ram_bytes;
> > -        void *host;
> > +        void *host = 0;
> > +        void *page_buffer = 0;
> > +        void *postcopy_place_source = 0;
> 
> NULL, NULL, NULL?

Fixed.

> BTW, do we really need postcopy_place_source?  I think that just doing
> s/postcopy_place_source/postcopy_host_page/ would do?

They are not always the same.  In the host-page size = target-page size case
we make use of  qemu_get_buffer_in_place():

+                qemu_get_buffer_in_place(f, (uint8_t **)&postcopy_place_source,
+                                         TARGET_PAGE_SIZE);

depending on the alignment of the buffer in the stream that *may* change
postcopy_place_source to just point into the qemu_file buffer and then we pluck
the data straight out of there without an extra copy.

> >          uint8_t ch;
> > +        bool all_zero = false;
> >  
> >          addr = qemu_get_be64(f);
> >          flags = addr & ~TARGET_PAGE_MASK;
> >          addr &= TARGET_PAGE_MASK;
> >  
> > +        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> > +                     RAM_SAVE_FLAG_XBZRLE)) {
> > +            host = host_from_stream_offset(f, mis, addr, flags);
> > +            if (!host) {
> > +                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> > +                ret = -EINVAL;
> > +                break;
> > +            }
> > +            if (!postcopy_running) {
> > +                page_buffer = host;
> > +            } else {
> > +                /*
> > +                 * Postcopy requires that we place whole host pages atomically.
> > +                 * To make it atomic, the data is read into a temporary page
> > +                 * that's moved into place later.
> > +                 * The migration protocol uses,  possibly smaller, target-pages
> > +                 * however the source ensures it always sends all the components
> > +                 * of a host page in order.
> > +                 */
> > +                if (!postcopy_host_page) {
> > +                    postcopy_host_page = postcopy_get_tmp_page(mis);
> > +                }
> > +                page_buffer = postcopy_host_page +
> > +                              ((uintptr_t)host & ~qemu_host_page_mask);
> > +                /* If all TP are zero then we can optimise the place */
> > +                if (!((uintptr_t)host & ~qemu_host_page_mask)) {

Lets include the next line to make it easier to understand:
    +                    all_zero = true;
    +                }

> I don't understand the test, the comment or both :-(
>
> How you arrive from that test that this is a page full of zeros is a
> mistery to me :p

We end up at this code at the start of each target page received
on the stream.  That condition is true if we're at the 1st target page
within a host page - i.e. the bottom bits (qemu_host_page_mask) of
the host address of the page are all zero (the !).  When we are at the 1st target
page in the host page, we initialise a flag 'all_zero' to be true,
which so far must be the case since we've not written anything.  If we
receive any page that's none zero we clear that flag.
If we get to the end of the host-page, find that all_zero is still true, then
we can avoid another copy.

So the important thing here is that we've not yet convinced ourself
that the page is full of zeros; it's just we know we've not read anything
in the host page yet, so we assume it's all zeros until we find out otherwise.

> Head hurts, would try to convince myself that the rest of changes are ok.

Yes, I'll take a week off to recover from the explanation.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
  2015-07-13 11:02   ` Juan Quintela
@ 2015-07-20 10:06   ` Amit Shah
  2015-07-27  9:55     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-20 10:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:31], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The state of the postcopy process is managed via a series of messages;
>    * Add wrappers and handlers for sending/receiving these messages
>    * Add state variable that track the current state of postcopy
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

But:

> +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> +                                           uint16_t len,
> +                                           uint64_t *start_list,
> +                                           uint64_t *end_list)
> +{
> +    uint8_t *buf;
> +    uint16_t tmplen;
> +    uint16_t t;
> +    size_t name_len = strlen(name);
> +
> +    trace_qemu_savevm_send_postcopy_ram_discard(name, len);
> +    buf = g_malloc0(len*16 + name_len + 3);
> +    buf[0] = 0; /* Version */
> +    assert(name_len < 256);
> +    buf[1] = name_len;
> +    memcpy(buf+2, name, name_len);
> +    tmplen = 2+name_len;
> +    buf[tmplen++] = '\0';

whitespace around operators missing

> +static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> +                                              uint16_t len)

> +    len -= 3+strlen(ramid);

ditto

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-07-13 11:02   ` Juan Quintela
@ 2015-07-20 10:13     ` Amit Shah
  2015-08-26 14:48     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-20 10:13 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, pbonzini, david

On (Mon) 13 Jul 2015 [13:02:09], Juan Quintela wrote:

> > +    /* We're expecting a
> > +     *    Version (0)
> > +     *    a RAM ID string (length byte, name, 0 term)
> > +     *    then at least 1 16 byte chunk
> > +    */
> > +    if (len < 20) { 1 +
> 
>        1+1+1+1+2*8
> 
> Humm, thinking about it, .... why are we not needing a length field of
> number of entries?

hm, yea.

> > +        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
> > +        return -1;
> > +    }
> > +
> > +    tmp = qemu_get_byte(mis->file);
> > +    if (tmp != 0) {
> 
> I think that a constant telling POSTCOPY_VERSION0 or whatever?

agreed.

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
  2015-07-13 11:07   ` Juan Quintela
@ 2015-07-21  6:11   ` Amit Shah
  2015-07-27 17:28     ` Dr. David Alan Gilbert
  2015-08-04  5:27   ` Amit Shah
  2 siblings, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-21  6:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:32], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
> stream inside a package whose length can be determined purely by reading
> its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
> is read off the stream prior to parsing the contents.
> 
> This is used by postcopy to load device state (from the package)
> while leaving the main stream free to receive memory pages.

Not sure why this is necessary.  I suppose I'll have to go read the
documentation in patch 1..

However:

> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -718,6 +718,50 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
>      qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
>  }
>  
> +/* We have a buffer of data to send; we don't want that all to be loaded
> + * by the command itself, so the command contains just the length of the
> + * extra buffer that we then send straight after it.
> + * TODO: Must be a better way to organise that
> + *
> + * Returns:
> + *    0 on success
> + *    -ve on error
> + */
> +int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
> +{
> +    size_t cur_iov;
> +    size_t len = qsb_get_length(qsb);
> +    uint32_t tmp;
> +
> +    if (len > MAX_VM_CMD_PACKAGED_SIZE) {
> +        error_report("%s: Unreasonably large packaged state: %zu",
> +                     __func__, len);
> +        return -1;
> +    }
> +
> +    tmp = cpu_to_be32(len);
> +
> +    trace_qemu_savevm_send_packaged();
> +    qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp);
> +
> +    /* all the data follows (concatinating the iov's) */
> +    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
> +        /* The iov entries are partially filled */
> +        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
> +                              len :
> +                              qsb->iov[cur_iov].iov_len;

If iov_len was > len, we only wrote part of the current buffer, and we
skip to the next?


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
  2015-07-13 11:12   ` Juan Quintela
@ 2015-07-21  6:17   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-21  6:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:33], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Modify save_live_pending to return separate postcopiable and
> non-postcopiable counts.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test Dr. David Alan Gilbert (git)
  2015-07-13 11:20   ` Juan Quintela
@ 2015-07-21  7:29   ` Amit Shah
  2015-07-27 17:38     ` Dr. David Alan Gilbert
  2015-08-04  5:28   ` Amit Shah
  2 siblings, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-21  7:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:34], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Provide a check to see if the OS we're running on has all the bits
> needed for postcopy.
> 
> Creates postcopy-ram.c which will get most of the other helpers we need.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> @@ -1165,6 +1166,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
>          return -1;
>      }
>  
> +    if (!postcopy_ram_supported_by_host()) {
> +        return -1;
> +    }
> +

So this is just advise: if we receive this, and we can't handle
postcopy, we're going to abort migration?  Shouldn't we continue and
let src know that we can't accept postcopy?  ie the problem is
currently punted to higher levels, should we try to handle it
ourselves?



		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-07-13 18:07       ` Juan Quintela
@ 2015-07-21  7:40         ` Amit Shah
  2015-09-24  9:59           ` Dr. David Alan Gilbert
  2015-09-24 14:20         ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-21  7:40 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert,
	qemu-devel, luis, pbonzini, david

On (Mon) 13 Jul 2015 [20:07:52], Juan Quintela wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:

> >> > +void qmp_migrate_start_postcopy(Error **errp)
> >> > +{
> >> > +    MigrationState *s = migrate_get_current();
> >> > +
> >> > +    if (!migrate_postcopy_ram()) {
> >> > +        error_setg(errp, "Enable postcopy with migration_set_capability before"
> >> > +                         " the start of migration");
> >> > +        return;
> >> > +    }
> >> > +
> >> > +    if (s->state == MIGRATION_STATUS_NONE) {
> >> 
> >> I would claim that this check should be:
> >> 
> >>     if (s->state != MIGRATION_STATUS_ACTIVE) {
> >> ??
> >> 
> >> FAILED, COMPLETED, CANCELL* don't make sense, right?
> >
> > What I'm trying to catch here is people doing:
> >      migrate_start_postcopy
> >      migrate tcp:pppp:whereever
> >
> >   which wont work, because migrate_init reinitialises
> > the flag that start previously set.
> >
> > However, I also don't want to create a race, since what you do is
> > typically:
> >      migrate  tcp:pppp:whereever
> >    <wait some time, get bored>
> >      migrate_start_postcopy
> >
> > if you're unlucky, and the migration finishes just
> > at the same time you do the migrate_start_postcopy, do you
> > want migrate_start_postcopy to fail?  My guess was it
> > was best for it not to fail, in this case.
> 
> Change the order, if it is ACTIVE: do the postcopy thing, otherwise, do
> the clause that is protected now?  Moving to postcopy only make sense if
> we are in active.

Yeah, I tend to agree, because in the cases where migration has failed
or has been cancelled, we'll end up setting the postcopy bit.  Then,
upon the next migration, this bit could get reused - resulting in the
previous condition of setting postcopy bit before starting migration.


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
  2015-07-13 11:27   ` Juan Quintela
@ 2015-07-21 10:33   ` Amit Shah
  2015-09-23 17:04     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-21 10:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:36], Dr. David Alan Gilbert (git) wrote:

> -    if (s->state == MIGRATION_STATUS_ACTIVE ||
> -        s->state == MIGRATION_STATUS_SETUP) {
> +    if (migration_already_active(s)) {

(I know, not introduced here, but:)

A better name is migration_is_active()

> +bool migration_postcopy_phase(MigrationState *s)
> +{
> +    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +}

And this is better named migration_in_postcopy()

otherwise,

Reviewed-by: Amit Shah <amit.shah@redhat.com>



		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
  2015-07-13 11:35   ` Juan Quintela
@ 2015-07-21 10:42   ` Amit Shah
  2015-07-27 17:58     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-21 10:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:37], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add qemu_savevm_state_complete_postcopy to complement
> qemu_savevm_state_complete_precopy together with a new
> save_live_complete_postcopy method on devices.
> 
> The save_live_complete_precopy method is called on
> all devices during a precopy migration, and all non-postcopy
> devices during a postcopy migration at the transition.
> 
> The save_live_complete_postcopy method is called at
> the end of postcopy for all postcopiable devices.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

But:

> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -938,7 +938,47 @@ int qemu_savevm_state_iterate(QEMUFile *f)
>  static bool should_send_vmdesc(void)
>  {
>      MachineState *machine = MACHINE(qdev_get_machine());
> -    return !machine->suppress_vmdesc;
> +    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
> +    return !machine->suppress_vmdesc && !in_postcopy;
> +}

This should be split in its own patch.


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
  2015-07-13 11:47   ` Juan Quintela
@ 2015-07-21 11:36   ` Amit Shah
  2015-07-31 16:51     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-21 11:36 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:38], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Where postcopy is preceeded by a period of precopy, the destination will
> have received pages that may have been dirtied on the source after the
> page was sent.  The destination must throw these pages away before
> starting it's CPUs.
> 
> Maintain a 'sentmap' of pages that have already been sent.
> Calculate list of sent & dirty pages
> Provide helpers on the destination side to discard these.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

Some whitespace issues, and some sentences in comments don't have a
full-stop:

> +/*
> + * Called by the bitmap code for each chunk to discard
> + * May send a discard message, may just leave it queued to
> + * be sent later
> + * 'start' and 'end' describe an inclusive range of pages in the
> + * migration bitmap in the RAM block passed to postcopy_discard_send_init
> + */
> +void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
> +                                unsigned long start, unsigned long end);

unaligned line; no full-stop in comment above (similar elsewhere, not
repeating that).

> +/*
> + * Discard the contents of memory start..end inclusive.
> + * We can assume that if we've been called postcopy_ram_hosttest returned true
> + */
> +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> +                               uint8_t *end)
> +{
> +    trace_postcopy_ram_discard_range(start, end);
> +    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {

whitespace around operators

> +/*
> + * Called by the bitmap code for each chunk to discard
> + * May send a discard message, may just leave it queued to
> + * be sent later
> + * 'start' and 'end' describe an inclusive range of pages in the
> + * migration bitmap in the RAM block passed to postcopy_discard_send_init

missing punctuation

(also, you had started doing doxygen-style comments, want to keep on
following that style?)

> +static RAMBlock *ram_find_block(const char *id)

just a suggestion, not very particular about this:  rename to

ram_find_block_by_id()

instead, so that it's clear what method of finding we're using; also
no name conflicts when there might be other ways of doing a find.


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
  2015-07-13 12:04   ` Juan Quintela
@ 2015-07-22  6:19   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-22  6:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:39], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
  2015-07-13 12:10   ` Juan Quintela
@ 2015-07-23  5:22   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-23  5:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:40], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Mark the area of RAM as 'userfault'
> Start up a fault-thread to handle any userfaults we might receive
> from it (to be filled in later)
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread
  2015-07-13 17:56     ` Dr. David Alan Gilbert
  2015-07-13 18:09       ` Juan Quintela
@ 2015-07-23  5:53       ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-23  5:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, Juan Quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Mon) 13 Jul 2015 [18:56:55], Dr. David Alan Gilbert wrote:
> * Juan Quintela (quintela@redhat.com) wrote:

> > > +    /*
> > > +     * send rest of state - note things that are doing postcopy
> > > +     * will notice we're in POSTCOPY_ACTIVE and not actually
> > > +     * wrap their state up here
> > > +     */
> > > +    qemu_file_set_rate_limit(ms->file, INT64_MAX);
> > 
> > Do we undo this?  or, are we sure that it is ok to maximize network
> > output?
> 
> No we don't undo it;  it's a good question what we can do better.
> I'm trying to avoid delaying the postcopy-requested pages; ideally
> I'd like to separate those out so they get satisfied but still
> meet the bandwidth limit for the background transfer.
> The ideal is separate fd's, however something else I've considered
> is getting incoming postcopy requests to wake the outgoing side
> up when it's sleeping for the bandwidth limit, although I've
> not tried implementing that yet.

Might be a conflict in the knobs we expose (max_bandwidth) and us not
adhering to that.

I agree we want this to go full-throttle, so maybe document that
postcopy will override that knob?  It's tricky to get everyone to
understand that postcopy will do that.  Plus there'll be other
questions like what else does postcopy override? -- not that there's
anythign more, but users will wonder.

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
  2015-07-13 12:56   ` Juan Quintela
@ 2015-07-23  5:55   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-23  5:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:41], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Rework the migration thread to setup and start postcopy.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread Dr. David Alan Gilbert (git)
  2015-07-13 13:15   ` Juan Quintela
@ 2015-07-23  6:41   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-23  6:41 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:42], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> The end of migration in postcopy is a bit different since some of
> the things normally done at the end of migration have already been
> done on the transition to postcopy.
> 
> The end of migration code is getting a bit complciated now, so
> move out into its own function.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread
  2015-07-13 13:15   ` Juan Quintela
@ 2015-07-23  6:41     ` Amit Shah
  2015-08-04 11:31     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-23  6:41 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, pbonzini, david

On (Mon) 13 Jul 2015 [15:15:07], Juan Quintela wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > The end of migration in postcopy is a bit different since some of
> > the things normally done at the end of migration have already been
> > done on the transition to postcopy.
> >
> > The end of migration code is getting a bit complciated now, so
> > move out into its own function.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> I think that I would splint the function and then add the postcopy code.

Yeah, esp since this code was added / modified in the previous patch.

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
  2015-07-13 13:24   ` Juan Quintela
@ 2015-07-23  6:50   ` Amit Shah
  2015-08-06 14:21     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-23  6:50 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:43], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
> destination to request a page from the source.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -113,6 +113,36 @@ static void deferred_incoming_migration(Error **errp)
>      deferred_incoming = true;
>  }
>  
> +/* Request a range of pages from the source VM at the given
> + * start address.
> + *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
> + *           as the last request (a name must have been given previously)

Why not just send the name all the time?

> @@ -1010,6 +1058,28 @@ static void *source_return_path_thread(void *opaque)
>              trace_source_return_path_thread_pong(tmp32);
>              break;
>  
> +        case MIG_RP_MSG_REQ_PAGES:
> +            start = be64_to_cpup((uint64_t *)buf);
> +            len = be64_to_cpup(((uint64_t *)buf)+1);
> +            tmpstr = NULL;
> +            if (len & 1) {
> +                len -= 1; /* Remove the flag */
> +                /* Now we expect an idstr */
> +                tmp32 = buf[16]; /* Length of the following idstr */
> +                tmpstr = (char *)&buf[17];
> +                buf[17+tmp32] = '\0';
> +                expected_len = 16+1+tmp32;

Whitespace missing around operators

> +            } else {
> +                expected_len = 16;
> +            }

This else can be removed if expected_len is set before the if


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request Dr. David Alan Gilbert (git)
  2015-07-14  9:18   ` Juan Quintela
@ 2015-07-23 12:23   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-23 12:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:44], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> On receiving MIG_RPCOMM_REQ_PAGES look up the address and
> queue the page.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
  2015-07-14  9:40   ` Juan Quintela
@ 2015-07-27  6:05   ` Amit Shah
  2015-09-16 18:48     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-27  6:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:45], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When transmitting RAM pages, consume pages that have been queued by
> MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

It's slightly confusing with 'consume': we're /servicing/ requests from
the dest at the src here rather than /consuming/ pages sent by src at
the dest.  If you find 'service' better than 'consume', please update
the commit msg+log.

> Note:
>   a) After a queued page the linear walk carries on from after the
> unqueued page; there is a reasonable chance that the destination
> was about to ask for other closeby pages anyway.
> 
>   b) We have to be careful of any assumptions that the page walking
> code makes, in particular it does some short cuts on its first linear
> walk that break as soon as we do a queued page.
> 
>   c) We have to be careful to not break up host-page size chunks, since
> this makes it harder to place the pages on the destination.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

> +static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock* block,
> +                              ram_addr_t *offset, bool last_stage,
> +                              uint64_t *bytes_transferred,
> +                              ram_addr_t dirty_ram_abs)
> +{
> +    int tmppages, pages = 0;
> +    do {
> +        /* Check the pages is dirty and if it is send it */
> +        if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
> +            if (compression_switch && migrate_use_compression()) {
> +                tmppages = ram_save_compressed_page(f, block, *offset,
> +                                                    last_stage,
> +                                                    bytes_transferred);
> +            } else {
> +                tmppages = ram_save_page(f, block, *offset, last_stage,
> +                                         bytes_transferred);
> +            }

Something for the future: we should just have ram_save_page which does
compression (or not); and even encryption (or not), and so on.

> +
> +            if (tmppages < 0) {
> +                return tmppages;
> +            } else {
> +                if (ms->sentmap) {
> +                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> +                }
> +            }

This else could be dropped as the if stmt returns.

> +            pages += tmppages;
> +        }
> +        *offset += TARGET_PAGE_SIZE;
> +        dirty_ram_abs += TARGET_PAGE_SIZE;
> +    } while (*offset & (qemu_host_page_size - 1));
> +
> +    /* The offset we leave with is the last one we looked at */
> +    *offset -= TARGET_PAGE_SIZE;
> +    return pages;
> +}
> +
> +/**
>   * ram_find_and_save_block: Finds a dirty page and sends it to f
>   *
>   * Called within an RCU critical section.
> @@ -997,65 +1094,102 @@ err:
>   * @f: QEMUFile where to send the data
>   * @last_stage: if we are at the completion stage
>   * @bytes_transferred: increase it with the number of transferred bytes
> + *
> + * On systems where host-page-size > target-page-size it will send all the
> + * pages in a host page that are dirty.
>   */
>  
>  static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
>                                     uint64_t *bytes_transferred)
>  {
> +    MigrationState *ms = migrate_get_current();
>      RAMBlock *block = last_seen_block;
> +    RAMBlock *tmpblock;
>      ram_addr_t offset = last_offset;
> +    ram_addr_t tmpoffset;
>      bool complete_round = false;
>      int pages = 0;
> -    MemoryRegion *mr;
>      ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
>                                   ram_addr_t space */
>  
> -    if (!block)
> +    if (!block) {
>          block = QLIST_FIRST_RCU(&ram_list.blocks);
> +        last_was_from_queue = false;
> +    }
>  
> -    while (true) {
> -        mr = block->mr;
> -        offset = migration_bitmap_find_and_reset_dirty(mr, offset,
> -                                                       &dirty_ram_abs);
> -        if (complete_round && block == last_seen_block &&
> -            offset >= last_offset) {
> -            break;
> -        }
> -        if (offset >= block->used_length) {
> -            offset = 0;
> -            block = QLIST_NEXT_RCU(block, next);
> -            if (!block) {
> -                block = QLIST_FIRST_RCU(&ram_list.blocks);
> -                complete_round = true;
> -                ram_bulk_stage = false;
> -                if (migrate_use_xbzrle()) {
> -                    /* If xbzrle is on, stop using the data compression at this
> -                     * point. In theory, xbzrle can do better than compression.
> -                     */
> -                    flush_compressed_data(f);
> -                    compression_switch = false;
> -                }
> +    while (true) { /* Until we send a block or run out of stuff to send */
> +        tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &dirty_ram_abs);
> +
> +        if (tmpblock) {
> +            /* We've got a block from the postcopy queue */
> +            trace_ram_find_and_save_block_postcopy(tmpblock->idstr,
> +                                                   (uint64_t)tmpoffset,
> +                                                   (uint64_t)dirty_ram_abs);
> +            /*
> +             * We're sending this page, and since it's postcopy nothing else
> +             * will dirty it, and we must make sure it doesn't get sent again
> +             * even if this queue request was received after the background
> +             * search already sent it.
> +             */
> +            if (!test_bit(dirty_ram_abs >> TARGET_PAGE_BITS,
> +                          migration_bitmap)) {
> +                trace_ram_find_and_save_block_postcopy_not_dirty(
> +                    tmpblock->idstr, (uint64_t)tmpoffset,
> +                    (uint64_t)dirty_ram_abs,
> +                    test_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap));
> +
> +                continue;
>              }
> +            /*
> +             * As soon as we start servicing pages out of order, then we have
> +             * to kill the bulk stage, since the bulk stage assumes
> +             * in (migration_bitmap_find_and_reset_dirty) that every page is
> +             * dirty, that's no longer true.
> +             */
> +            ram_bulk_stage = false;
> +            /*
> +             * We want the background search to continue from the queued page
> +             * since the guest is likely to want other pages near to the page
> +             * it just requested.
> +             */
> +            block = tmpblock;
> +            offset = tmpoffset;
>          } else {
> -            if (compression_switch && migrate_use_compression()) {
> -                pages = ram_save_compressed_page(f, block, offset, last_stage,
> -                                                 bytes_transferred);
> -            } else {
> -                pages = ram_save_page(f, block, offset, last_stage,
> -                                      bytes_transferred);
> +            MemoryRegion *mr;
> +            /* priority queue empty, so just search for something dirty */
> +            mr = block->mr;
> +            offset = migration_bitmap_find_dirty(mr, offset, &dirty_ram_abs);
> +            if (complete_round && block == last_seen_block &&
> +                offset >= last_offset) {
> +                break;
>              }
> -
> -            /* if page is unmodified, continue to the next */
> -            if (pages > 0) {
> -                MigrationState *ms = migrate_get_current();
> -                if (ms->sentmap) {
> -                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> +            if (offset >= block->used_length) {
> +                offset = 0;
> +                block = QLIST_NEXT_RCU(block, next);
> +                if (!block) {
> +                    block = QLIST_FIRST_RCU(&ram_list.blocks);
> +                    complete_round = true;
> +                    ram_bulk_stage = false;
> +                    if (migrate_use_xbzrle()) {
> +                        /* If xbzrle is on, stop using the data compression at
> +                         * this point. In theory, xbzrle can do better than
> +                         * compression.
> +                         */
> +                        flush_compressed_data(f);
> +                        compression_switch = false;
> +                    }
>                  }
> -
> -                last_sent_block = block;
> -                break;
> +                continue; /* pick an offset in the new block */
>              }
>          }
> +
> +        pages = ram_save_host_page(ms, f, block, &offset, last_stage,
> +                                   bytes_transferred, dirty_ram_abs);
> +
> +        /* if page is unmodified, continue to the next */
> +        if (pages > 0) {
> +            break;
> +        }

This function could use splitting into multiple ones.


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers
  2015-07-14 10:05   ` Juan Quintela
@ 2015-07-27  6:11     ` Amit Shah
  2015-09-23 16:45     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-27  6:11 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, pbonzini, david

On (Tue) 14 Jul 2015 [12:05:33], Juan Quintela wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> > +/*
> > + * Place a host page (from) at (host) atomically
> > + * all_zero: Hint that the page being placed is 0 throughout
> > + * returns 0 on success
> > + */
> > +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> > +                        bool all_zero)
> 
> postcop_place_page() and postcop_place_zero_page()?  They just share a
> trace point :p

Yea, I thought the same.

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
  2015-07-14 10:05   ` Juan Quintela
@ 2015-07-27  6:11   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-27  6:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:46], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> postcopy_place_page (etc) provide a way for postcopy to place a page
> into guests memory atomically (using the copy ioctl on the ufd).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
  2015-07-14 12:34   ` Juan Quintela
@ 2015-07-27  7:39   ` Amit Shah
  2015-08-06 11:22     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-27  7:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:47], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> In postcopy, the destination guest is running at the same time
> as it's receiving pages; as we receive new pages we must put
> them into the guests address space atomically to avoid a running
> CPU accessing a partially written page.
> 
> Use the helpers in postcopy-ram.c to map these pages.
> 
> qemu_get_buffer_less_copy is used to avoid a copy out of qemu_file
> in the case that postcopy is going to do a copy anyway.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>


> @@ -1881,6 +1890,16 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      int flags = 0, ret = 0;
>      static uint64_t seq_iter;
>      int len = 0;
> +    /*
> +     * System is running in postcopy mode, page inserts to host memory must be
> +     * atomic
> +     */

*If* system is running in postcopy mode ....

> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    bool postcopy_running = postcopy_state_get(mis) >=
> +                            POSTCOPY_INCOMING_LISTENING;
> +    void *postcopy_host_page = NULL;
> +    bool postcopy_place_needed = false;
> +    bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
>  
>      seq_iter++;
>  
> @@ -1896,13 +1915,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
>      rcu_read_lock();
>      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>          ram_addr_t addr, total_ram_bytes;
> -        void *host;
> +        void *host = 0;
> +        void *page_buffer = 0;
> +        void *postcopy_place_source = 0;
>          uint8_t ch;
> +        bool all_zero = false;
>  
>          addr = qemu_get_be64(f);
>          flags = addr & ~TARGET_PAGE_MASK;
>          addr &= TARGET_PAGE_MASK;
>  
> +        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> +                     RAM_SAVE_FLAG_XBZRLE)) {
> +            host = host_from_stream_offset(f, mis, addr, flags);
> +            if (!host) {
> +                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> +                ret = -EINVAL;
> +                break;
> +            }

So the host_from_stream_offset was moved here from below.  One
invocation below is still left, which is a bug..

> +            if (!postcopy_running) {
> +                page_buffer = host;
> +            } else {

Instead of this, can we just do:

	   page_buffer = host;
	   if (postcopy_running) {

> +                /*
> +                 * Postcopy requires that we place whole host pages atomically.
> +                 * To make it atomic, the data is read into a temporary page
> +                 * that's moved into place later.
> +                 * The migration protocol uses,  possibly smaller, target-pages
> +                 * however the source ensures it always sends all the components
> +                 * of a host page in order.
> +                 */
> +                if (!postcopy_host_page) {
> +                    postcopy_host_page = postcopy_get_tmp_page(mis);
> +                }
> +                page_buffer = postcopy_host_page +
> +                              ((uintptr_t)host & ~qemu_host_page_mask);
> +                /* If all TP are zero then we can optimise the place */
> +                if (!((uintptr_t)host & ~qemu_host_page_mask)) {
> +                    all_zero = true;
> +                }
> +
> +                /*
> +                 * If it's the last part of a host page then we place the host
> +                 * page
> +                 */
> +                postcopy_place_needed = (((uintptr_t)host + TARGET_PAGE_SIZE) &
> +                                         ~qemu_host_page_mask) == 0;
> +                postcopy_place_source = postcopy_host_page;
> +            }
> +        } else {
> +            postcopy_place_needed = false;
> +        }

... and similar for postcopy_place_needed as well?  It becomes much
easier to read.

>          case RAM_SAVE_FLAG_COMPRESS_PAGE:
> -            host = host_from_stream_offset(f, addr, flags);
> +            all_zero = false;
> +            if (postcopy_running) {
> +                error_report("Compressed RAM in postcopy mode @%zx\n", addr);
> +                return -EINVAL;
> +            }
> +            host = host_from_stream_offset(f, mis, addr, flags);

This line should go (as mentioned above)?


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
  2015-07-14 12:36   ` Juan Quintela
@ 2015-07-27  7:43   ` Amit Shah
  2015-07-31  9:50     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-27  7:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Once we're in postcopy the source processors are stopped and memory
> shouldn't change any more, so there's no need to look at the dirty
> map.
> 
> There are two notes to this:
>   1) If we do resync and a page had changed then the page would get
>      sent again, which the destination wouldn't allow (since it might
>      have also modified the page)
>   2) Before disabling this I'd seen very rare cases where a page had been
>      marked dirtied although the memory contents are apparently identical

I suppose we don't know why.  Any way to send a message to the dest
with this info, so the dest can print out something?  That'll help in
debugging.  (I'm suggesting sending a message to the dest, because
after a migration, we don't ever think of looking at messages on the
src.  And chances are the dest could blow up after a migration is
successful because of such "corruption".)

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-07-20 10:06   ` Amit Shah
@ 2015-07-27  9:55     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-27  9:55 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:31], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > The state of the postcopy process is managed via a series of messages;
> >    * Add wrappers and handlers for sending/receiving these messages
> >    * Add state variable that track the current state of postcopy
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Thanks,

> But:
> 
> > +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> > +                                           uint16_t len,
> > +                                           uint64_t *start_list,
> > +                                           uint64_t *end_list)
> > +{
> > +    uint8_t *buf;
> > +    uint16_t tmplen;
> > +    uint16_t t;
> > +    size_t name_len = strlen(name);
> > +
> > +    trace_qemu_savevm_send_postcopy_ram_discard(name, len);
> > +    buf = g_malloc0(len*16 + name_len + 3);
> > +    buf[0] = 0; /* Version */
> > +    assert(name_len < 256);
> > +    buf[1] = name_len;
> > +    memcpy(buf+2, name, name_len);
> > +    tmplen = 2+name_len;
> > +    buf[tmplen++] = '\0';
> 
> whitespace around operators missing
> 
> > +static int loadvm_postcopy_ram_handle_discard(MigrationIncomingState *mis,
> > +                                              uint16_t len)
> 
> > +    len -= 3+strlen(ramid);
> 
> ditto

Fixed.

Dave

> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
  2015-07-14 15:10   ` Juan Quintela
@ 2015-07-27 14:29   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-27 14:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:50], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> userfaultfd is a Linux syscall that gives an fd that receives a stream
> of notifications of accesses to pages registered with it and allows
> the program to acknowledge those stalls and tell the accessing
> thread to carry on.
> 
> We convert the requests from the kernel into messages back to the
> source asking for the pages.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-07-21  6:11   ` Amit Shah
@ 2015-07-27 17:28     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-27 17:28 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:32], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
> > stream inside a package whose length can be determined purely by reading
> > its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
> > is read off the stream prior to parsing the contents.
> > 
> > This is used by postcopy to load device state (from the package)
> > while leaving the main stream free to receive memory pages.
> 
> Not sure why this is necessary.  I suppose I'll have to go read the
> documentation in patch 1..

Yep - or one of the previous replies where I explained it.

> However:
> 
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -718,6 +718,50 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
> >      qemu_savevm_command_send(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL);
> >  }
> >  
> > +/* We have a buffer of data to send; we don't want that all to be loaded
> > + * by the command itself, so the command contains just the length of the
> > + * extra buffer that we then send straight after it.
> > + * TODO: Must be a better way to organise that
> > + *
> > + * Returns:
> > + *    0 on success
> > + *    -ve on error
> > + */
> > +int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
> > +{
> > +    size_t cur_iov;
> > +    size_t len = qsb_get_length(qsb);
> > +    uint32_t tmp;
> > +
> > +    if (len > MAX_VM_CMD_PACKAGED_SIZE) {
> > +        error_report("%s: Unreasonably large packaged state: %zu",
> > +                     __func__, len);
> > +        return -1;
> > +    }
> > +
> > +    tmp = cpu_to_be32(len);
> > +
> > +    trace_qemu_savevm_send_packaged();
> > +    qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp);
> > +
> > +    /* all the data follows (concatinating the iov's) */
> > +    for (cur_iov = 0; cur_iov < qsb->n_iov; cur_iov++) {
> > +        /* The iov entries are partially filled */
> > +        size_t towrite = (qsb->iov[cur_iov].iov_len > len) ?
> > +                              len :
> > +                              qsb->iov[cur_iov].iov_len;
> 
> If iov_len was > len, we only wrote part of the current buffer, and we
> skip to the next?

Yes; this is just the end case; the qsb allocates iov entries in 'chunks'
but then the data that gets added often doesn't use the whole chunk.
'len'  - set above from qsb_get_length - gives the used contents of
the qsb, and that's all we want to write.  This is normally the case
on the last entry in the qsb anyway; however since 'len' gets
decremented by the amount written we might go once more around the loop
and the 'if (!towrite) { break; }' might break us out of the loop instead.

Dave

> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test
  2015-07-21  7:29   ` Amit Shah
@ 2015-07-27 17:38     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-27 17:38 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:34], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Provide a check to see if the OS we're running on has all the bits
> > needed for postcopy.
> > 
> > Creates postcopy-ram.c which will get most of the other helpers we need.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> > @@ -1165,6 +1166,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
> >          return -1;
> >      }
> >  
> > +    if (!postcopy_ram_supported_by_host()) {
> > +        return -1;
> > +    }
> > +
> 
> So this is just advise: if we receive this, and we can't handle
> postcopy, we're going to abort migration?  Shouldn't we continue and
> let src know that we can't accept postcopy?  ie the problem is
> currently punted to higher levels, should we try to handle it
> ourselves?

We could, although it happens right at the start of migration, and it only
happens if you explicitly enabled the postcopy capability, so I'm not
sure there's any advantage in trying to fall back when you've been told
to use it.

Dave

> 
> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy
  2015-07-21 10:42   ` Amit Shah
@ 2015-07-27 17:58     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-27 17:58 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:37], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Add qemu_savevm_state_complete_postcopy to complement
> > qemu_savevm_state_complete_precopy together with a new
> > save_live_complete_postcopy method on devices.
> > 
> > The save_live_complete_precopy method is called on
> > all devices during a precopy migration, and all non-postcopy
> > devices during a postcopy migration at the transition.
> > 
> > The save_live_complete_postcopy method is called at
> > the end of postcopy for all postcopiable devices.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> But:
> 
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -938,7 +938,47 @@ int qemu_savevm_state_iterate(QEMUFile *f)
> >  static bool should_send_vmdesc(void)
> >  {
> >      MachineState *machine = MACHINE(qdev_get_machine());
> > -    return !machine->suppress_vmdesc;
> > +    bool in_postcopy = migration_postcopy_phase(migrate_get_current());
> > +    return !machine->suppress_vmdesc && !in_postcopy;
> > +}
> 
> This should be split in its own patch.

Thanks, split.

Dave
> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 39/42] postcopy: Wire up loadvm_postcopy_handle_ commands
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 39/42] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
  2015-07-14 15:14   ` Juan Quintela
@ 2015-07-28  5:53   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-28  5:53 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:52], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Wire up more of the handlers for the commands on the destination side,
> in particular loadvm_postcopy_handle_run now has enough to start the
> guest running.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 40/42] End of migration for postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 40/42] End of migration for postcopy Dr. David Alan Gilbert (git)
  2015-07-14 15:15   ` Juan Quintela
@ 2015-07-28  5:55   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-28  5:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:53], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Tweak the end of migration cleanup; we don't want to close stuff down
> at the end of the main stream, since the postcopy is still sending pages
> on the other thread.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy
  2015-07-14 15:22   ` Juan Quintela
@ 2015-07-28  6:02     ` Amit Shah
  2015-07-28 11:32       ` Juan Quintela
  2015-09-24 10:36     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-28  6:02 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, pbonzini, david

On (Tue) 14 Jul 2015 [17:22:13], Juan Quintela wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> > +    if (enable_mlock) {
> > +        if (os_mlock() < 0) {
> > +            error_report("mlock: %s", strerror(errno));
> > +            /*
> > +             * It doesn't feel right to fail at this point, we have a valid
> > +             * VM state.
> > +             */
> 
> realtime_init() exit in case of os_mlock() fails, so current code is:

Yea, I was wondering the same - but then I thought: would the realtime
case want a migration to happen at all?

> - we start qemu with mlock requset
> - we mlock memory
> - we start postcopy
> - we munlock memory
> - we mlock memory
> 
> I wmill really, really preffer having a check if memory is mlocked, and
> it that case, just abort migration altogether.  Or better still, wait to
> enable mlock *until* we have finished postcopy, no?

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
  2015-07-14 15:22   ` Juan Quintela
@ 2015-07-28  6:02   ` Amit Shah
  1 sibling, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-28  6:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:54], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Userfault doesn't work with mlock; mlock is designed to nail down pages
> so they don't move, userfault is designed to tell you when they're not
> there.
> 
> munlock the pages we userfault protect before postcopy.
> mlock everything again at the end if mlock is enabled.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
  2015-07-14 15:24   ` Juan Quintela
@ 2015-07-28  6:15   ` Amit Shah
  2015-07-28  9:08     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-28  6:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Postcopy detects accesses to pages that haven't been transferred yet
> using userfaultfd, and it causes exceptions on pages that are 'not
> present'.
> Ballooning also causes pages to be marked as 'not present' when the
> guest inflates the balloon.
> Potentially a balloon could be inflated to discard pages that are
> currently inflight during postcopy and that may be arriving at about
> the same time.
> 
> To avoid this confusion, disable ballooning during postcopy.
> 
> When disabled we drop balloon requests from the guest.  Since ballooning
> is generally initiated by the host, the management system should avoid
> initiating any balloon instructions to the guest during migration,
> although it's not possible to know how long it would take a guest to
> process a request made prior to the start of migration.
> 
> Queueing the requests until after migration would be nice, but is
> non-trivial, since the set of inflate/deflate requests have to
> be compared with the state of the page to know what the final
> outcome is allowed to be.

I didn't track the previous discussion, but there were plans to have
guest-initiated balloon requests for cases where the guest wants to
co-operate with hosts and return any free mem available We don't
currently have guests that do this, but we also don't want to have a
dependency between the host and guest -- they should be independent.

This approach here seems the simplest possible, short of maintaining
another bitmap for the duration of postcopy which indicates
guest-freed memory pages which postcopy should not populate, after
receiving them at the dest (this sounds better to me than queuing up
guest requests).

The downside here is that the guest offered some memory back, and we
don't use it.  The guest also doesn't use it -- so it's a double loss,
of sorts.

Thoughts?  I don't have a problem with this current approach, but if
we could get something better, that'll be good too.

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/42] Postcopy implementation
  2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
                   ` (41 preceding siblings ...)
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
@ 2015-07-28  6:21 ` Amit Shah
  42 siblings, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-07-28  6:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:13], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
>   This is the 7th cut of my version of postcopy; it is designed for use with
> the Linux kernel additions posted by Andrea Arcangeli here:
> 
> git clone --reference linux -b userfault21
> git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git
> 
> Note this API is slightly different from the last version; but the code is now
> in the linux-mm tree, the API is getting stable and the kernel code is just
> getting fixes now.
> 
> This qemu series can be found at:
> 
> https://github.com/orbitfp7/qemu.git
> on the wp3-postcopy-v7 tag.
> 
> It addresses most of the previous review comments, but there are
> still one or two I'm working.
> 
> As with v6, the userfaultfd.h isn't included in the tree, so you'll
> need a userfaultfd.h header (and syscall define) from Andrea's kernel.
> 
> This work has been partially funded by the EU Orbit project:
>   see http://www.orbitproject.eu/about/

Very well-split series, that must've taken a lot of work, but makes
reviewing much easier, thanks a lot for that.

I'm mostly through the whole set, a few patches remain.  I don't think
there's anything major here.

As I've said previously, I'll prefer to merge when the kernel patches
are in.  However, we can go ahead with the x- prefix, so we don't
declare anything as a supported interface yet.  Also, configure
disabling this by default because we won't have the syscall number.
If you agree, we can include this early in the 2.5 series.

In addition to the few comments on individual patches, there's
something I noted in multiple patches:

* Some patches that introduce qmp interfaces talk of 'Since: 2.4',
  please revise to 2.5.
* Can you please go through the commit messages for formatting /
  grammar?  They're either too terse or lack some punctuation.  If you
  prefer, I can go through these and point these out as well.
* Similar to the previous point for some comments in-line as well, but
  I suppose those we could fix later.

Thanks,

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy
  2015-07-28  6:15   ` Amit Shah
@ 2015-07-28  9:08     ` Dr. David Alan Gilbert
  2015-07-28 10:01       ` Amit Shah
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-28  9:08 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Postcopy detects accesses to pages that haven't been transferred yet
> > using userfaultfd, and it causes exceptions on pages that are 'not
> > present'.
> > Ballooning also causes pages to be marked as 'not present' when the
> > guest inflates the balloon.
> > Potentially a balloon could be inflated to discard pages that are
> > currently inflight during postcopy and that may be arriving at about
> > the same time.
> > 
> > To avoid this confusion, disable ballooning during postcopy.
> > 
> > When disabled we drop balloon requests from the guest.  Since ballooning
> > is generally initiated by the host, the management system should avoid
> > initiating any balloon instructions to the guest during migration,
> > although it's not possible to know how long it would take a guest to
> > process a request made prior to the start of migration.
> > 
> > Queueing the requests until after migration would be nice, but is
> > non-trivial, since the set of inflate/deflate requests have to
> > be compared with the state of the page to know what the final
> > outcome is allowed to be.
> 
> I didn't track the previous discussion, but there were plans to have
> guest-initiated balloon requests for cases where the guest wants to
> co-operate with hosts and return any free mem available We don't
> currently have guests that do this, but we also don't want to have a
> dependency between the host and guest -- they should be independent.
> 
> This approach here seems the simplest possible, short of maintaining
> another bitmap for the duration of postcopy which indicates
> guest-freed memory pages which postcopy should not populate, after
> receiving them at the dest (this sounds better to me than queuing up
> guest requests).
> 
> The downside here is that the guest offered some memory back, and we
> don't use it.  The guest also doesn't use it -- so it's a double loss,
> of sorts.
> 
> Thoughts?  I don't have a problem with this current approach, but if
> we could get something better, that'll be good too.

It needs something like that bitmap, but it would take quite a bit
of care to manage the interaction between:
    a) The guest emitting balloon notifications
    b) Pages being received from the source
    c) Destination use of that page

  we also have to think what to do with a page that's been ballooned
after reception of the source page; the madvise(dontneed) that's used
normally would cause userfault to fire again, and we can't allow that.
(We could make it the same as receiving a zero page).   But then we would
also have to cope with  the source sending us a page after the destination
has ballooned it and make sure to discard that (I suspect there are further
ordering examples that have to also be considered).

Dave
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy
  2015-07-28  9:08     ` Dr. David Alan Gilbert
@ 2015-07-28 10:01       ` Amit Shah
  2015-07-28 11:16         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-07-28 10:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 28 Jul 2015 [10:08:15], Dr. David Alan Gilbert wrote:
> * Amit Shah (amit.shah@redhat.com) wrote:
> > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Postcopy detects accesses to pages that haven't been transferred yet
> > > using userfaultfd, and it causes exceptions on pages that are 'not
> > > present'.
> > > Ballooning also causes pages to be marked as 'not present' when the
> > > guest inflates the balloon.
> > > Potentially a balloon could be inflated to discard pages that are
> > > currently inflight during postcopy and that may be arriving at about
> > > the same time.
> > > 
> > > To avoid this confusion, disable ballooning during postcopy.
> > > 
> > > When disabled we drop balloon requests from the guest.  Since ballooning
> > > is generally initiated by the host, the management system should avoid
> > > initiating any balloon instructions to the guest during migration,
> > > although it's not possible to know how long it would take a guest to
> > > process a request made prior to the start of migration.
> > > 
> > > Queueing the requests until after migration would be nice, but is
> > > non-trivial, since the set of inflate/deflate requests have to
> > > be compared with the state of the page to know what the final
> > > outcome is allowed to be.
> > 
> > I didn't track the previous discussion, but there were plans to have
> > guest-initiated balloon requests for cases where the guest wants to
> > co-operate with hosts and return any free mem available We don't
> > currently have guests that do this, but we also don't want to have a
> > dependency between the host and guest -- they should be independent.
> > 
> > This approach here seems the simplest possible, short of maintaining
> > another bitmap for the duration of postcopy which indicates
> > guest-freed memory pages which postcopy should not populate, after
> > receiving them at the dest (this sounds better to me than queuing up
> > guest requests).
> > 
> > The downside here is that the guest offered some memory back, and we
> > don't use it.  The guest also doesn't use it -- so it's a double loss,
> > of sorts.
> > 
> > Thoughts?  I don't have a problem with this current approach, but if
> > we could get something better, that'll be good too.
> 
> It needs something like that bitmap, but it would take quite a bit
> of care to manage the interaction between:
>     a) The guest emitting balloon notifications
>     b) Pages being received from the source
>     c) Destination use of that page
> 
>   we also have to think what to do with a page that's been ballooned
> after reception of the source page; the madvise(dontneed) that's used
> normally would cause userfault to fire again, and we can't allow that.
> (We could make it the same as receiving a zero page).   But then we would
> also have to cope with  the source sending us a page after the destination
> has ballooned it and make sure to discard that (I suspect there are further
> ordering examples that have to also be considered).

Yeah.  I'm fine with the current approach, with the downsides
mentioned.  Maybe in the commit message, make it explicit that the
guest may think it's given up ownership, but the host won't honour
this till postcopy isn't finished.

Anyway:

Reviewed-by: Amit Shah <amit.shah@redhat.com>


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy
  2015-07-28 10:01       ` Amit Shah
@ 2015-07-28 11:16         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-28 11:16 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 28 Jul 2015 [10:08:15], Dr. David Alan Gilbert wrote:
> > * Amit Shah (amit.shah@redhat.com) wrote:
> > > On (Tue) 16 Jun 2015 [11:26:55], Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Postcopy detects accesses to pages that haven't been transferred yet
> > > > using userfaultfd, and it causes exceptions on pages that are 'not
> > > > present'.
> > > > Ballooning also causes pages to be marked as 'not present' when the
> > > > guest inflates the balloon.
> > > > Potentially a balloon could be inflated to discard pages that are
> > > > currently inflight during postcopy and that may be arriving at about
> > > > the same time.
> > > > 
> > > > To avoid this confusion, disable ballooning during postcopy.
> > > > 
> > > > When disabled we drop balloon requests from the guest.  Since ballooning
> > > > is generally initiated by the host, the management system should avoid
> > > > initiating any balloon instructions to the guest during migration,
> > > > although it's not possible to know how long it would take a guest to
> > > > process a request made prior to the start of migration.
> > > > 
> > > > Queueing the requests until after migration would be nice, but is
> > > > non-trivial, since the set of inflate/deflate requests have to
> > > > be compared with the state of the page to know what the final
> > > > outcome is allowed to be.
> > > 
> > > I didn't track the previous discussion, but there were plans to have
> > > guest-initiated balloon requests for cases where the guest wants to
> > > co-operate with hosts and return any free mem available We don't
> > > currently have guests that do this, but we also don't want to have a
> > > dependency between the host and guest -- they should be independent.
> > > 
> > > This approach here seems the simplest possible, short of maintaining
> > > another bitmap for the duration of postcopy which indicates
> > > guest-freed memory pages which postcopy should not populate, after
> > > receiving them at the dest (this sounds better to me than queuing up
> > > guest requests).
> > > 
> > > The downside here is that the guest offered some memory back, and we
> > > don't use it.  The guest also doesn't use it -- so it's a double loss,
> > > of sorts.
> > > 
> > > Thoughts?  I don't have a problem with this current approach, but if
> > > we could get something better, that'll be good too.
> > 
> > It needs something like that bitmap, but it would take quite a bit
> > of care to manage the interaction between:
> >     a) The guest emitting balloon notifications
> >     b) Pages being received from the source
> >     c) Destination use of that page
> > 
> >   we also have to think what to do with a page that's been ballooned
> > after reception of the source page; the madvise(dontneed) that's used
> > normally would cause userfault to fire again, and we can't allow that.
> > (We could make it the same as receiving a zero page).   But then we would
> > also have to cope with  the source sending us a page after the destination
> > has ballooned it and make sure to discard that (I suspect there are further
> > ordering examples that have to also be considered).
> 
> Yeah.  I'm fine with the current approach, with the downsides
> mentioned.  Maybe in the commit message, make it explicit that the
> guest may think it's given up ownership, but the host won't honour
> this till postcopy isn't finished.

OK, I've added the text:
'Guest initiated ballooning will not know if it's really freed a page
of host memory or not.'

> Anyway:
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Thanks.

Dave

> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy
  2015-07-28  6:02     ` Amit Shah
@ 2015-07-28 11:32       ` Juan Quintela
  2015-08-06 14:55         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 209+ messages in thread
From: Juan Quintela @ 2015-07-28 11:32 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, pbonzini, david

Amit Shah <amit.shah@redhat.com> wrote:
> On (Tue) 14 Jul 2015 [17:22:13], Juan Quintela wrote:
>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>
>> > +    if (enable_mlock) {
>> > +        if (os_mlock() < 0) {
>> > +            error_report("mlock: %s", strerror(errno));
>> > +            /*
>> > +             * It doesn't feel right to fail at this point, we have a valid
>> > +             * VM state.
>> > +             */
>> 
>> realtime_init() exit in case of os_mlock() fails, so current code is:
>
> Yea, I was wondering the same - but then I thought: would the realtime
> case want a migration to happen at all?

Then disable migration with realtime looks like saner.  But that
decission don't belong to this series.

>
>> - we start qemu with mlock requset
>> - we mlock memory
>> - we start postcopy
>> - we munlock memory
>> - we mlock memory
>> 
>> I wmill really, really preffer having a check if memory is mlocked, and
>> it that case, just abort migration altogether.  Or better still, wait to
>> enable mlock *until* we have finished postcopy, no?
>
> 		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy
  2015-07-27  7:43   ` Amit Shah
@ 2015-07-31  9:50     ` Dr. David Alan Gilbert
  2015-08-04  5:46       ` Amit Shah
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-31  9:50 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Once we're in postcopy the source processors are stopped and memory
> > shouldn't change any more, so there's no need to look at the dirty
> > map.
> > 
> > There are two notes to this:
> >   1) If we do resync and a page had changed then the page would get
> >      sent again, which the destination wouldn't allow (since it might
> >      have also modified the page)
> >   2) Before disabling this I'd seen very rare cases where a page had been
> >      marked dirtied although the memory contents are apparently identical
> 
> I suppose we don't know why.  Any way to send a message to the dest
> with this info, so the dest can print out something?  That'll help in
> debugging.  (I'm suggesting sending a message to the dest, because
> after a migration, we don't ever think of looking at messages on the
> src.  And chances are the dest could blow up after a migration is
> successful because of such "corruption".)

One way perhaps would be to do one more sync at the end, after migration
is apparently finished, but before the socket was closed; that would
detect these changes and you could send a message to the other end.  However,
given that (2) I say that where I'd seen it the page contents were
identical, this could be a false alarm, so we'd need to be careful.
It also doesn't help you find out *why* it happens, since tracing
back from a bit in the migration bitmap to the area of memory
and the thing that marked it dirty is very hard.  The only way to do
that, is to mark the memory as read-only and then get a backtrace
to find out who tried to change it; but you don't want to do
that on a normal build and cause the source to die.

> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>

Thanks.

Dave
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 36/42] Host page!=target page: Cleanup bitmaps
  2015-07-14 15:01   ` Juan Quintela
@ 2015-07-31 15:53     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-31 15:53 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Prior to the start of postcopy, ensure that everything that will
> > be transferred later is a whole host-page in size.
> >
> > This is accomplished by discarding partially transferred host pages
> > and marking any that are partially dirty as fully dirty.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> >  /*
> > + * Helper for postcopy_chunk_hostpages where HPS/TPS >= bits-in-long
> > + *
> > + * !! Untested !!
> 
> You continue in the race for best comment ever O:-)

I prefer honesty in comments, especially for the next person who tries
to use it!

> > + */
> > +static int hostpage_big_chunk_helper(const char *block_name, void *host_addr,
> > +                                     ram_addr_t offset, ram_addr_t length,
> > +                                     void *opaque)
> > +{
> > +    MigrationState *ms = opaque;
> > +    unsigned long long_bits = sizeof(long) * 8;
> > +    unsigned int host_len = (qemu_host_page_size / TARGET_PAGE_SIZE) /
> > +                            long_bits;
> > +    unsigned long first_long, last_long, cur_long, current_hp;
> > +    unsigned long first = offset >> TARGET_PAGE_BITS;
> > +    unsigned long last = (offset + (length - 1)) >> TARGET_PAGE_BITS;
> > +
> > +    PostcopyDiscardState *pds = postcopy_discard_send_init(ms,
> > +                                                           first,
> > +                                                           block_name);
> 
> Minor
> 
> PostcopyDiscardState *pds =
>                      postcopy_discard_send_init(ms, first, block_name);
> 
> ??

Done.

<snip>

> No need for this:
> 
> find_first_bit()
> find_first_zero_bit()
> 
> You are warking all the words when a single search is enough?

> creative use of bitmap_zero(), bitmap_fill() and just doing o whelo
> postcopy_discard_send_rand() would not be better?

<snip>

> > +        mask &= (((unsigned long)1) << ((end & long_bits_mask) + 1)) - 1;
> 
>            bitmap_set(&mask, 0, end);
> 
> 
> Adjust +1/-1 depending on how you do limits?
> 
> BTW, when I need inspiration about how to code functions that deal with
> bits,  I searc for inspiration in bitmap.c.  Sometimes function already
> exist, and otherwise, things like BITS_PER_LONG, etc, are already
> defined there.

OK, I've reworked it using the bitmap/bitops.h functions:
   1 file changed, 128 insertions(+), 220 deletions(-)

(still untested).
Doing it this way it's hand to be two iterations, one for
fixing up partially sent host pages, and the second for fixing up 
partially dirtied pages.

I'll try and find a !x86 to try it on.


> > +    /* Easiest way to make sure we don't resume in the middle of a host-page */
> > +    last_seen_block = NULL;
> > +    last_sent_block = NULL;
> 
> Best names ever.  And you have to blame me at least for the second one
> to appear :p
> 
> 
> > +
> > +    /*
> > +     * The currently worst known ratio is ARM that has 1kB target pages, and
> > +     * can have 64kB host pages, which is thus inconveniently larger than a long
> > +     * on ARM (32bits), and a long is the underlying element of the migration
> > +     * bitmaps.
> > +     */
> > +    if (host_bits >= long_bits) {
> > +        /* Deal with the odd case separately */
> > +        return qemu_ram_foreach_block(hostpage_big_chunk_helper, ms);
> > +    } else {
> > +        host_mask =  (1ul << host_bits) - 1;
> > +    }
> 
> You can remove the else enterily and just put the code at top level.
> 
> So, we have three cases:
> 
> - host_bits == target_bits -> NOP
> - host_bits >= long_bits
> - host_bits < long_bits
> 
> Couldn't we merge the last two?  they are very similar, and having two
> code paths looks too much to me?

Yep, so those are gone now.

> > @@ -1405,9 +1664,17 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
> >      int ret;
> >  
> >      rcu_read_lock();
> > +
> Another not needed.

Moved that back to where the rest of the function came from.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy
  2015-07-13 11:12   ` Juan Quintela
@ 2015-07-31 16:13     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-31 16:13 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, Dr. David Alan Gilbert (git),
	qemu-devel, luis, amit.shah, pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Modify save_live_pending to return separate postcopiable and
> > non-postcopiable counts.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>

Thanks,

> I think that if you make a small change of meaning, everything gots easier:
> 
> > -static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
> > +static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
> > +                               uint64_t *non_postcopiable_pending,
> > +                               uint64_t *postcopiable_pending)
> >  {
> >      /* Estimate pending number of bytes to send */
> >      uint64_t pending;
> > @@ -773,7 +775,8 @@ static uint64_t block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
> >      qemu_mutex_unlock_iothread();
> >  
> >      DPRINTF("Enter save live pending  %" PRIu64 "\n", pending);
> > -    return pending;
> > +    *non_postcopiable_pending = pending;
> > +    *postcopiable_pending = 0;
> 
> Change that two lines to:
> 
>        *non_postcopiable_pending += pending;
>        *postcopiable_pending += 0; /* ok, equivalent of doing nothing */
> 
> This way, chaining gots easier?

OK, done; I did it as:

+    /* We can do postcopy, and all the data is postcopiable */
+    *postcopiable_pending += remaining_size;

rather than having the odd += 0;

> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 2c4cbe1..ebd3d31 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1012,10 +1012,20 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f)
> >      qemu_fflush(f);
> >  }
> >  
> > -uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
> > +/* Give an estimate of the amount left to be transferred,
> > + * the result is split into the amount for units that can and
> > + * for units that can't do postcopy.
> > + */
> > +void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
> > +                               uint64_t *res_non_postcopiable,
> > +                               uint64_t *res_postcopiable)
> >  {
> >      SaveStateEntry *se;
> > -    uint64_t ret = 0;
> > +    uint64_t tmp_non_postcopiable, tmp_postcopiable;
> > +
> > +    *res_non_postcopiable = 0;
> > +    *res_postcopiable = 0;
> > +
> >  
> >      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> >          if (!se->ops || !se->ops->save_live_pending) {
> > @@ -1026,9 +1036,12 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size)
> >                  continue;
> >              }
> >          }
> > -        ret += se->ops->save_live_pending(f, se->opaque, max_size);
> > +        se->ops->save_live_pending(f, se->opaque, max_size,
> > +                                   &tmp_non_postcopiable, &tmp_postcopiable);
> > +
> > +        *res_postcopiable += tmp_postcopiable;
> > +        *res_non_postcopiable += tmp_non_postcopiable;
> >      }
> > -    return ret;
> 
> With the change, we don't care in the other functions, and this one gets
> simpler IMHO.

Yep,

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard
  2015-07-21 11:36   ` Amit Shah
@ 2015-07-31 16:51     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-31 16:51 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:38], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Where postcopy is preceeded by a period of precopy, the destination will
> > have received pages that may have been dirtied on the source after the
> > page was sent.  The destination must throw these pages away before
> > starting it's CPUs.
> > 
> > Maintain a 'sentmap' of pages that have already been sent.
> > Calculate list of sent & dirty pages
> > Provide helpers on the destination side to discard these.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> Some whitespace issues, and some sentences in comments don't have a
> full-stop:
> 
> > +/*
> > + * Called by the bitmap code for each chunk to discard
> > + * May send a discard message, may just leave it queued to
> > + * be sent later
> > + * 'start' and 'end' describe an inclusive range of pages in the
> > + * migration bitmap in the RAM block passed to postcopy_discard_send_init
> > + */
> > +void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
> > +                                unsigned long start, unsigned long end);
> 
> unaligned line; no full-stop in comment above (similar elsewhere, not
> repeating that).

Fixed.

> > +/*
> > + * Discard the contents of memory start..end inclusive.
> > + * We can assume that if we've been called postcopy_ram_hosttest returned true
> > + */
> > +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> > +                               uint8_t *end)
> > +{
> > +    trace_postcopy_ram_discard_range(start, end);
> > +    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {
> 
> whitespace around operators

Fixed.

> > +/*
> > + * Called by the bitmap code for each chunk to discard
> > + * May send a discard message, may just leave it queued to
> > + * be sent later
> > + * 'start' and 'end' describe an inclusive range of pages in the
> > + * migration bitmap in the RAM block passed to postcopy_discard_send_init
> 
> missing punctuation
> 
> (also, you had started doing doxygen-style comments, want to keep on
> following that style?)

Fixed (I'm sure there are others, still getting into the hang of that).

> > +static RAMBlock *ram_find_block(const char *id)
> 
> just a suggestion, not very particular about this:  rename to
> 
> ram_find_block_by_id()
> 
> instead, so that it's clear what method of finding we're using; also
> no name conflicts when there might be other ways of doing a find.

Done.

Dave

> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works Dr. David Alan Gilbert (git)
                     ` (2 preceding siblings ...)
  2015-06-26  6:46   ` Yang Hongyang
@ 2015-08-04  5:20   ` Amit Shah
  2015-08-05 12:21     ` Dr. David Alan Gilbert
  3 siblings, 1 reply; 209+ messages in thread
From: Amit Shah @ 2015-08-04  5:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:14], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

A few minor comments:

> ---
>  docs/migration.txt | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 167 insertions(+)
> 
> diff --git a/docs/migration.txt b/docs/migration.txt
> index f6df4be..b4b93d1 100644
> --- a/docs/migration.txt
> +++ b/docs/migration.txt
> @@ -291,3 +291,170 @@ save/send this state when we are in the middle of a pio operation
>  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
>  not enabled, the values on that fields are garbage and don't need to
>  be sent.
> +
> += Return path =
> +
> +In most migration scenarios there is only a single data path that runs
> +from the source VM to the destination, typically along a single fd (although
> +possibly with another fd or similar for some fast way of throwing pages across).
> +
> +However, some uses need two way communication; in particular the Postcopy destination
> +needs to be able to request pages on demand from the source.
> +
> +For these scenarios there is a 'return path' from the destination to the source;
> +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
> +path.
> +
> +  Source side
> +     Forward path - written by migration thread
> +     Return path  - opened by main thread, read by return-path thread
> +
> +  Destination side
> +     Forward path - read by main thread
> +     Return path  - opened by main thread, written by main thread AND postcopy
> +                    thread (protected by rp_mutex)
> +
> += Postcopy =
> +'Postcopy' migration is a way to deal with migrations that refuse to converge;

(or take too long to converge)

> +its plus side is that there is an upper bound on the amount of migration traffic
> +and time it takes, the down side is that during the postcopy phase, a failure of
> +*either* side or the network connection causes the guest to be lost.
> +
> +In postcopy the destination CPUs are started before all the memory has been
> +transferred, and accesses to pages that are yet to be transferred cause
> +a fault that's translated by QEMU into a request to the source QEMU.
> +
> +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
> +doesn't finish in a given time the switch is made to postcopy.
> +
> +=== Enabling postcopy ===
> +
> +To enable postcopy (prior to the start of migration):

How about this instead:

"To enable postcopy, issue this command ont he monitor prior to the
start of migration:"

Otherwise, there's ambiguity that there is some way to enable this
after a precopy migration has started.

> +
> +migrate_set_capability x-postcopy-ram on
> +
> +The migration will still start in precopy mode, however issuing:

"A future migration will then start in precopy mode.  However,
issuing:"

?

> +
> +migrate_start_postcopy
> +
> +will now cause the transition from precopy to postcopy.
> +It can be issued immediately after migration is started or any
> +time later on.  Issuing it after the end of a migration is harmless.
> +
> +=== Postcopy device transfer ===
> +
> +Loading of device data may cause the device emulation to access guest RAM
> +that may trigger faults that have to be resolved by the source, as such
> +the migration stream has to be able to respond with page data *during* the
> +device load, and hence the device data has to be read from the stream completely
> +before the device load begins to free the stream up.  This is achieved by
> +'packaging' the device data into a blob that's read in one go.
> +
> +Source behaviour
> +
> +Until postcopy is entered the migration stream is identical to normal
> +precopy, except for the addition of a 'postcopy advise' command at
> +the beginning, to tell the destination that postcopy might happen.
> +When postcopy starts the source sends the page discard data and then
> +forms the 'package' containing:
> +
> +   Command: 'postcopy listen'
> +   The device state
> +      A series of sections, identical to the precopy streams device state stream
> +      containing everything except postcopiable devices (i.e. RAM)
> +   Command: 'postcopy run'
> +
> +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
> +contents are formatted in the same way as the main migration stream.
> +
> +Destination behaviour
> +
> +Initially the destination looks the same as precopy, with a single thread
> +reading the migration stream; the 'postcopy advise' and 'discard' commands
> +are processed to change the way RAM is managed, but don't affect the stream
> +processing.
> +
> +------------------------------------------------------------------------------
> +                        1      2   3     4 5                      6   7
> +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
> +thread                             |       |
> +                                   |     (page request)
> +                                   |        \___
> +                                   v            \
> +listen thread:                     --- page -- page -- page -- page -- page --
> +
> +                                   a   b        c
> +------------------------------------------------------------------------------
> +
> +On receipt of CMD_PACKAGED (1)
> +   All the data associated with the package - the ( ... ) section in the
> +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
> +recurses into qemu_loadvm_state_main to process the contents of the package (2)
> +which contains commands (3,6) and devices (4...)
> +
> +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
> +a new thread (a) is started that takes over servicing the migration stream,
> +while the main thread carries on loading the package.   It loads normal
> +background page data (b) but if during a device load a fault happens (5) the
> +returned page (c) is loaded by the listen thread allowing the main threads
> +device load to carry on.
> +
> +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
> +CPUs start running.
> +At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
> +and is no longer used by migration, while the listen thread carries
> +on servicing page data until the end of migration.
> +
> +=== Postcopy states ===
> +
> +Postcopy moves through a series of states (see postcopy_state) from
> +ADVISE->LISTEN->RUNNING->END
> +
> +  Advise: Set at the start of migration if postcopy is enabled, even
> +          if it hasn't had the start command; here the destination
> +          checks that its OS has the support needed for postcopy, and performs
> +          setup to ensure the RAM mappings are suitable for later postcopy.
> +          (Triggered by reception of POSTCOPY_ADVISE command)

Adding:

"This gives the destination a chance to fail early if postcopy is not
possible."

?

> +
> +  Listen: The first command in the package, POSTCOPY_LISTEN, switches
> +          the destination state to Listen, and starts a new thread
> +          (the 'listen thread') which takes over the job of receiving
> +          pages off the migration stream, while the main thread carries
> +          on processing the blob.  With this thread able to process page
> +          reception, the destination now 'sensitises' the RAM to detect
> +          any access to missing pages (on Linux using the 'userfault'
> +          system).
> +
> +  Running: POSTCOPY_RUN causes the destination to synchronise all
> +          state and start the CPUs and IO devices running.  The main
> +          thread now finishes processing the migration package and
> +          now carries on as it would for normal precopy migration
> +          (although it can't do the cleanup it would do as it
> +          finishes a normal migration).

indentation went off a bit



		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest Dr. David Alan Gilbert (git)
  2015-06-17 11:49   ` Juan Quintela
  2015-07-06  6:14   ` Amit Shah
@ 2015-08-04  5:23   ` Amit Shah
  2 siblings, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-08-04  5:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:16], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> One of my patches used a loop that was based on host page size;
> it dies in qtest since qtest hadn't bothered init'ing it.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>


		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
  2015-07-13 11:07   ` Juan Quintela
  2015-07-21  6:11   ` Amit Shah
@ 2015-08-04  5:27   ` Amit Shah
  2 siblings, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-08-04  5:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:32], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
> stream inside a package whose length can be determined purely by reading
> its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
> is read off the stream prior to parsing the contents.
> 
> This is used by postcopy to load device state (from the package)
> while leaving the main stream free to receive memory pages.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test Dr. David Alan Gilbert (git)
  2015-07-13 11:20   ` Juan Quintela
  2015-07-21  7:29   ` Amit Shah
@ 2015-08-04  5:28   ` Amit Shah
  2 siblings, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-08-04  5:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git)
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Tue) 16 Jun 2015 [11:26:34], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Provide a check to see if the OS we're running on has all the bits
> needed for postcopy.
> 
> Creates postcopy-ram.c which will get most of the other helpers we need.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Amit Shah <amit.shah@redhat.com>

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy
  2015-07-31  9:50     ` Dr. David Alan Gilbert
@ 2015-08-04  5:46       ` Amit Shah
  0 siblings, 0 replies; 209+ messages in thread
From: Amit Shah @ 2015-08-04  5:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

On (Fri) 31 Jul 2015 [10:50:46], Dr. David Alan Gilbert wrote:
> * Amit Shah (amit.shah@redhat.com) wrote:
> > On (Tue) 16 Jun 2015 [11:26:48], Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Once we're in postcopy the source processors are stopped and memory
> > > shouldn't change any more, so there's no need to look at the dirty
> > > map.
> > > 
> > > There are two notes to this:
> > >   1) If we do resync and a page had changed then the page would get
> > >      sent again, which the destination wouldn't allow (since it might
> > >      have also modified the page)
> > >   2) Before disabling this I'd seen very rare cases where a page had been
> > >      marked dirtied although the memory contents are apparently identical
> > 
> > I suppose we don't know why.  Any way to send a message to the dest
> > with this info, so the dest can print out something?  That'll help in
> > debugging.  (I'm suggesting sending a message to the dest, because
> > after a migration, we don't ever think of looking at messages on the
> > src.  And chances are the dest could blow up after a migration is
> > successful because of such "corruption".)
> 
> One way perhaps would be to do one more sync at the end, after migration
> is apparently finished, but before the socket was closed; that would
> detect these changes and you could send a message to the other end.  However,
> given that (2) I say that where I'd seen it the page contents were
> identical, this could be a false alarm, so we'd need to be careful.
> It also doesn't help you find out *why* it happens, since tracing
> back from a bit in the migration bitmap to the area of memory
> and the thing that marked it dirty is very hard.  The only way to do
> that, is to mark the memory as read-only and then get a backtrace
> to find out who tried to change it; but you don't want to do
> that on a normal build and cause the source to die.

Agreed - but some notification that something might possibly be wrong
is better than we not having such a clue, and fervently trying to
debug an issue.  In fact, a per-VM flag could be better since multiple
migrations may mean such notifications could be lost in the logs of a
previous host which we don't examine.

		Amit

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread
  2015-07-13 13:15   ` Juan Quintela
  2015-07-23  6:41     ` Amit Shah
@ 2015-08-04 11:31     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-04 11:31 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > The end of migration in postcopy is a bit different since some of
> > the things normally done at the end of migration have already been
> > done on the transition to postcopy.
> >
> > The end of migration code is getting a bit complciated now, so
> > move out into its own function.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> I think that I would splint the function and then add the postcopy code.

Done; I now have two patches:
  Split out end of migration code from migration_thread
  Postcopy: End of iteration

> BTW, it is a local function, we can use shorter names:
> 
> migration_completion()?
> 
> trace names specifically get hugggggggggge.

Done.

> > +static void migration_thread_end_of_iteration(MigrationState *s,
> > +                                              int current_active_state,
> 
> RunState?
> And it is not needed as parameter.

No, it's not RunState, it's derived from s->state  which is still an int;
it's also not the current state, but the current state we're expecting
to be in, i.e. one of MIGRATION_STATUS_ACTIVE or MIGRATION_STATE_POSTCOPY_ACTIVE
(which is why it's current_*active*_state); and it's only used as the
parameter to migrate_set_state - in the same way the current code does:

                   migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
                                     MIGRATION_STATUS_COMPLETED);

to ensure that any failure or cancel occuring at the same time isn't lost.

Dave
> 
> 
> > +                                              bool *old_vm_running,
> > +                                              int64_t *start_time)
> > +{
> > +    int ret;
> > +    if (s->state == MIGRATION_STATUS_ACTIVE) {
>            current_active_state = s->state;
> > +        qemu_mutex_lock_iothread();
> > +        *start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +        qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> > +        *old_vm_running = runstate_is_running();
> > +
> > +        ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> > +        if (ret >= 0) {
> > +            qemu_file_set_rate_limit(s->file, INT64_MAX);
> > +            qemu_savevm_state_complete_precopy(s->file);
> > +        }
> > +        qemu_mutex_unlock_iothread();
> > +
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> > +    } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
>            current_active_state = s->state;
> > +        trace_migration_thread_end_of_iteration_postcopy_end();
> > +
> > +        qemu_savevm_state_complete_postcopy(s->file);
> > +        trace_migration_thread_end_of_iteration_postcopy_end_after_complete();
> > +    }
> > +
> > +    /*
> > +     * If rp was opened we must clean up the thread before
> > +     * cleaning everything else up (since if there are no failures
> > +     * it will wait for the destination to send it's status in
> > +     * a SHUT command).
> > +     * Postcopy opens rp if enabled (even if it's not avtivated)
> > +     */
> > +    if (migrate_postcopy_ram()) {
> > +        int rp_error;
> > +        trace_migration_thread_end_of_iteration_postcopy_end_before_rp();
> > +        rp_error = await_return_path_close_on_source(s);
> > +        trace_migration_thread_end_of_iteration_postcopy_end_after_rp(rp_error);
> > +        if (rp_error) {
> > +            goto fail;
> > +        }
> > +    }
> > +
> > +    if (qemu_file_get_error(s->file)) {
> > +        trace_migration_thread_end_of_iteration_file_err();
> > +        goto fail;
> > +    }
> > +
> > +    migrate_set_state(s, current_active_state, MIGRATION_STATUS_COMPLETED);
> > +    return;
> > +
> > +fail:
> > +    migrate_set_state(s, current_active_state, MIGRATION_STATUS_FAILED);
> > +}
> > +
> > +/*
> >   * Master migration thread on the source VM.
> >   * It drives the migration and pumps the data down the outgoing channel.
> >   */
> > @@ -1233,31 +1294,11 @@ static void *migration_thread(void *opaque)
> >                  /* Just another iteration step */
> >                  qemu_savevm_state_iterate(s->file);
> >              } else {
> > -                int ret;
> > -
> > -                qemu_mutex_lock_iothread();
> > -                start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > -                qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
> > -                old_vm_running = runstate_is_running();
> > -
> > -                ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> > -                if (ret >= 0) {
> > -                    qemu_file_set_rate_limit(s->file, INT64_MAX);
> > -                    qemu_savevm_state_complete_precopy(s->file);
> > -                }
> > -                qemu_mutex_unlock_iothread();
> > +                trace_migration_thread_low_pending(pending_size);
> >  
> > -                if (ret < 0) {
> > -                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
> > -                                      MIGRATION_STATUS_FAILED);
> > -                    break;
> > -                }
> > -
> > -                if (!qemu_file_get_error(s->file)) {
> > -                    migrate_set_state(s, MIGRATION_STATUS_ACTIVE,
> > -                                      MIGRATION_STATUS_COMPLETED);
> > -                    break;
> > -                }
> > +                migration_thread_end_of_iteration(s, current_active_type,
> > +                    &old_vm_running, &start_time);
> > +                break;
> >              }
> >          }
> >  
> > diff --git a/trace-events b/trace-events
> > index f096877..528d5a3 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1425,6 +1425,12 @@ migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
> >  migration_thread_after_loop(void) ""
> >  migration_thread_file_err(void) ""
> >  migration_thread_setup_complete(void) ""
> > +migration_thread_low_pending(uint64_t pending) "%" PRIu64
> > +migration_thread_end_of_iteration_file_err(void) ""
> > +migration_thread_end_of_iteration_postcopy_end(void) ""
> > +migration_thread_end_of_iteration_postcopy_end_after_complete(void) ""
> > +migration_thread_end_of_iteration_postcopy_end_before_rp(void) ""
> > +migration_thread_end_of_iteration_postcopy_end_after_rp(int rp_error) "%d"
> >  open_return_path_on_source(void) ""
> >  open_return_path_on_source_continue(void) ""
> >  postcopy_start(void) ""
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
  2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path Dr. David Alan Gilbert (git)
  2015-07-13 10:29   ` Juan Quintela
  2015-07-15  7:50   ` Amit Shah
@ 2015-08-05  8:06   ` zhanghailiang
  2015-08-18 10:45     ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 209+ messages in thread
From: zhanghailiang @ 2015-08-05  8:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git), qemu-devel
  Cc: aarcange, yamahata, quintela, liang.z.li, peter.huangpeng, luis,
	amit.shah, pbonzini, david

Hi Dave,

On 2015/6/16 18:26, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Open a return path, and handle messages that are received upon it.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   include/migration/migration.h |   8 ++
>   migration/migration.c         | 177 +++++++++++++++++++++++++++++++++++++++++-
>   trace-events                  |  12 +++
>   3 files changed, 196 insertions(+), 1 deletion(-)
>
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 36caab9..868f59a 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -77,6 +77,14 @@ struct MigrationState
>
>       int state;
>       MigrationParams params;
> +
> +    /* State related to return path */
> +    struct {
> +        QEMUFile     *file;

There is already a 'file' member in MigrationState,
and since for migration, there is only one path direction, just from source side
to destination side, so it is ok to use that name.

But for post-copy and COLO, we need two-way communication,
So we can rename the original 'file' member of MigrationState to 'ouput_file',
and add a new 'input_file' member. For MigrationIncomingState struct, rename its original
'file' member to 'input_file',and add a new 'output_file'.
IMHO, this will make things more clear.

Thanks,
zhanghailiang


> +        QemuThread    rp_thread;
> +        bool          error;
> +    } rp_state;
> +
>       double mbps;
>       int64_t total_time;
>       int64_t downtime;
> diff --git a/migration/migration.c b/migration/migration.c
> index afb19a1..fb2f491 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -278,6 +278,23 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>       return params;
>   }
>
> +/*
> + * Return true if we're already in the middle of a migration
> + * (i.e. any of the active or setup states)
> + */
> +static bool migration_already_active(MigrationState *ms)
> +{
> +    switch (ms->state) {
> +    case MIGRATION_STATUS_ACTIVE:
> +    case MIGRATION_STATUS_SETUP:
> +        return true;
> +
> +    default:
> +        return false;
> +
> +    }
> +}
> +
>   static void get_xbzrle_cache_stats(MigrationInfo *info)
>   {
>       if (migrate_use_xbzrle()) {
> @@ -441,6 +458,21 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
>       }
>   }
>
> +static void migrate_fd_cleanup_src_rp(MigrationState *ms)
> +{
> +    QEMUFile *rp = ms->rp_state.file;
> +
> +    /*
> +     * When stuff goes wrong (e.g. failing destination) on the rp, it can get
> +     * cleaned up from a few threads; make sure not to do it twice in parallel
> +     */
> +    rp = atomic_cmpxchg(&ms->rp_state.file, rp, NULL);
> +    if (rp) {
> +        trace_migrate_fd_cleanup_src_rp();
> +        qemu_fclose(rp);
> +    }
> +}
> +
>   static void migrate_fd_cleanup(void *opaque)
>   {
>       MigrationState *s = opaque;
> @@ -448,6 +480,8 @@ static void migrate_fd_cleanup(void *opaque)
>       qemu_bh_delete(s->cleanup_bh);
>       s->cleanup_bh = NULL;
>
> +    migrate_fd_cleanup_src_rp(s);
> +
>       if (s->file) {
>           trace_migrate_fd_cleanup();
>           qemu_mutex_unlock_iothread();
> @@ -487,6 +521,11 @@ static void migrate_fd_cancel(MigrationState *s)
>       QEMUFile *f = migrate_get_current()->file;
>       trace_migrate_fd_cancel();
>
> +    if (s->rp_state.file) {
> +        /* shutdown the rp socket, so causing the rp thread to shutdown */
> +        qemu_file_shutdown(s->rp_state.file);
> +    }
> +
>       do {
>           old_state = s->state;
>           if (old_state != MIGRATION_STATUS_SETUP &&
> @@ -801,8 +840,144 @@ int64_t migrate_xbzrle_cache_size(void)
>       return s->xbzrle_cache_size;
>   }
>
> -/* migration thread support */
> +/*
> + * Something bad happened to the RP stream, mark an error
> + * The caller shall print something to indicate why
> + */
> +static void source_return_path_bad(MigrationState *s)
> +{
> +    s->rp_state.error = true;
> +    migrate_fd_cleanup_src_rp(s);
> +}
> +
> +/*
> + * Handles messages sent on the return path towards the source VM
> + *
> + */
> +static void *source_return_path_thread(void *opaque)
> +{
> +    MigrationState *ms = opaque;
> +    QEMUFile *rp = ms->rp_state.file;
> +    uint16_t expected_len, header_len, header_type;
> +    const int max_len = 512;
> +    uint8_t buf[max_len];
> +    uint32_t tmp32;
> +    int res;
> +
> +    trace_source_return_path_thread_entry();
> +    while (rp && !qemu_file_get_error(rp) &&
> +        migration_already_active(ms)) {
> +        trace_source_return_path_thread_loop_top();
> +        header_type = qemu_get_be16(rp);
> +        header_len = qemu_get_be16(rp);
> +
> +        switch (header_type) {
> +        case MIG_RP_MSG_SHUT:
> +        case MIG_RP_MSG_PONG:
> +            expected_len = 4;
> +            break;
> +
> +        default:
> +            error_report("RP: Received invalid message 0x%04x length 0x%04x",
> +                    header_type, header_len);
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
>
> +        if (header_len > expected_len) {
> +            error_report("RP: Received message 0x%04x with"
> +                    "incorrect length %d expecting %d",
> +                    header_type, header_len,
> +                    expected_len);
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
> +
> +        /* We know we've got a valid header by this point */
> +        res = qemu_get_buffer(rp, buf, header_len);
> +        if (res != header_len) {
> +            trace_source_return_path_thread_failed_read_cmd_data();
> +            source_return_path_bad(ms);
> +            goto out;
> +        }
> +
> +        /* OK, we have the message and the data */
> +        switch (header_type) {
> +        case MIG_RP_MSG_SHUT:
> +            tmp32 = be32_to_cpup((uint32_t *)buf);
> +            trace_source_return_path_thread_shut(tmp32);
> +            if (tmp32) {
> +                error_report("RP: Sibling indicated error %d", tmp32);
> +                source_return_path_bad(ms);
> +            }
> +            /*
> +             * We'll let the main thread deal with closing the RP
> +             * we could do a shutdown(2) on it, but we're the only user
> +             * anyway, so there's nothing gained.
> +             */
> +            goto out;
> +
> +        case MIG_RP_MSG_PONG:
> +            tmp32 = be32_to_cpup((uint32_t *)buf);
> +            trace_source_return_path_thread_pong(tmp32);
> +            break;
> +
> +        default:
> +            break;
> +        }
> +    }
> +    if (rp && qemu_file_get_error(rp)) {
> +        trace_source_return_path_thread_bad_end();
> +        source_return_path_bad(ms);
> +    }
> +
> +    trace_source_return_path_thread_end();
> +out:
> +    return NULL;
> +}
> +
> +__attribute__ (( unused )) /* Until later in patch series */
> +static int open_return_path_on_source(MigrationState *ms)
> +{
> +
> +    ms->rp_state.file = qemu_file_get_return_path(ms->file);
> +    if (!ms->rp_state.file) {
> +        return -1;
> +    }
> +
> +    trace_open_return_path_on_source();
> +    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
> +                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
> +
> +    trace_open_return_path_on_source_continue();
> +
> +    return 0;
> +}
> +
> +__attribute__ (( unused )) /* Until later in patch series */
> +/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
> +static int await_return_path_close_on_source(MigrationState *ms)
> +{
> +    /*
> +     * If this is a normal exit then the destination will send a SHUT and the
> +     * rp_thread will exit, however if there's an error we need to cause
> +     * it to exit, which we can do by a shutdown.
> +     * (canceling must also shutdown to stop us getting stuck here if
> +     * the destination died at just the wrong place)
> +     */
> +    if (qemu_file_get_error(ms->file) && ms->rp_state.file) {
> +        qemu_file_shutdown(ms->rp_state.file);
> +    }
> +    trace_await_return_path_close_on_source_joining();
> +    qemu_thread_join(&ms->rp_state.rp_thread);
> +    trace_await_return_path_close_on_source_close();
> +    return ms->rp_state.error;
> +}
> +
> +/*
> + * Master migration thread on the source VM.
> + * It drives the migration and pumps the data down the outgoing channel.
> + */
>   static void *migration_thread(void *opaque)
>   {
>       MigrationState *s = opaque;
> diff --git a/trace-events b/trace-events
> index 5738e3f..282cde1 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1394,12 +1394,24 @@ flic_no_device_api(int err) "flic: no Device Contral API support %d"
>   flic_reset_failed(int err) "flic: reset failed %d"
>
>   # migration.c
> +await_return_path_close_on_source_close(void) ""
> +await_return_path_close_on_source_joining(void) ""
>   migrate_set_state(int new_state) "new state %d"
>   migrate_fd_cleanup(void) ""
> +migrate_fd_cleanup_src_rp(void) ""
>   migrate_fd_error(void) ""
>   migrate_fd_cancel(void) ""
>   migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
>   migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
> +open_return_path_on_source(void) ""
> +open_return_path_on_source_continue(void) ""
> +source_return_path_thread_bad_end(void) ""
> +source_return_path_thread_end(void) ""
> +source_return_path_thread_entry(void) ""
> +source_return_path_thread_failed_read_cmd_data(void) ""
> +source_return_path_thread_loop_top(void) ""
> +source_return_path_thread_pong(uint32_t val) "%x"
> +source_return_path_thread_shut(uint32_t val) "%x"
>   migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
>
>   # migration/rdma.c
>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works.
  2015-08-04  5:20   ` Amit Shah
@ 2015-08-05 12:21     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-05 12:21 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:14], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> A few minor comments:
> 
> > ---
> >  docs/migration.txt | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 167 insertions(+)
> > 
> > diff --git a/docs/migration.txt b/docs/migration.txt
> > index f6df4be..b4b93d1 100644
> > --- a/docs/migration.txt
> > +++ b/docs/migration.txt
> > @@ -291,3 +291,170 @@ save/send this state when we are in the middle of a pio operation
> >  (that is what ide_drive_pio_state_needed() checks).  If DRQ_STAT is
> >  not enabled, the values on that fields are garbage and don't need to
> >  be sent.
> > +
> > += Return path =
> > +
> > +In most migration scenarios there is only a single data path that runs
> > +from the source VM to the destination, typically along a single fd (although
> > +possibly with another fd or similar for some fast way of throwing pages across).
> > +
> > +However, some uses need two way communication; in particular the Postcopy destination
> > +needs to be able to request pages on demand from the source.
> > +
> > +For these scenarios there is a 'return path' from the destination to the source;
> > +qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
> > +path.
> > +
> > +  Source side
> > +     Forward path - written by migration thread
> > +     Return path  - opened by main thread, read by return-path thread
> > +
> > +  Destination side
> > +     Forward path - read by main thread
> > +     Return path  - opened by main thread, written by main thread AND postcopy
> > +                    thread (protected by rp_mutex)
> > +
> > += Postcopy =
> > +'Postcopy' migration is a way to deal with migrations that refuse to converge;
> 
> (or take too long to converge)

Added.

> 
> > +its plus side is that there is an upper bound on the amount of migration traffic
> > +and time it takes, the down side is that during the postcopy phase, a failure of
> > +*either* side or the network connection causes the guest to be lost.
> > +
> > +In postcopy the destination CPUs are started before all the memory has been
> > +transferred, and accesses to pages that are yet to be transferred cause
> > +a fault that's translated by QEMU into a request to the source QEMU.
> > +
> > +Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
> > +doesn't finish in a given time the switch is made to postcopy.
> > +
> > +=== Enabling postcopy ===
> > +
> > +To enable postcopy (prior to the start of migration):
> 
> How about this instead:
> 
> "To enable postcopy, issue this command ont he monitor prior to the
> start of migration:"
> 
> Otherwise, there's ambiguity that there is some way to enable this
> after a precopy migration has started.

Done.

> > +
> > +migrate_set_capability x-postcopy-ram on
> > +
> > +The migration will still start in precopy mode, however issuing:
> 
> "A future migration will then start in precopy mode.  However,
> issuing:"
> 
> ?

Ah yes, I see it's ambiguous because it doesn't say you still need
to do the normal migration stuff to start migration;

I've changed it to:

The normal commands are then used to start a migration, which is still
started in precopy mode.  Issuing:

migrate_start_postcopy

will now cause the transition from precopy to postcopy.

> > +
> > +migrate_start_postcopy
> > +
> > +will now cause the transition from precopy to postcopy.
> > +It can be issued immediately after migration is started or any
> > +time later on.  Issuing it after the end of a migration is harmless.
> > +
> > +=== Postcopy device transfer ===
> > +
> > +Loading of device data may cause the device emulation to access guest RAM
> > +that may trigger faults that have to be resolved by the source, as such
> > +the migration stream has to be able to respond with page data *during* the
> > +device load, and hence the device data has to be read from the stream completely
> > +before the device load begins to free the stream up.  This is achieved by
> > +'packaging' the device data into a blob that's read in one go.
> > +
> > +Source behaviour
> > +
> > +Until postcopy is entered the migration stream is identical to normal
> > +precopy, except for the addition of a 'postcopy advise' command at
> > +the beginning, to tell the destination that postcopy might happen.
> > +When postcopy starts the source sends the page discard data and then
> > +forms the 'package' containing:
> > +
> > +   Command: 'postcopy listen'
> > +   The device state
> > +      A series of sections, identical to the precopy streams device state stream
> > +      containing everything except postcopiable devices (i.e. RAM)
> > +   Command: 'postcopy run'
> > +
> > +The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
> > +contents are formatted in the same way as the main migration stream.
> > +
> > +Destination behaviour
> > +
> > +Initially the destination looks the same as precopy, with a single thread
> > +reading the migration stream; the 'postcopy advise' and 'discard' commands
> > +are processed to change the way RAM is managed, but don't affect the stream
> > +processing.
> > +
> > +------------------------------------------------------------------------------
> > +                        1      2   3     4 5                      6   7
> > +main -----DISCARD-CMD_PACKAGED ( LISTEN  DEVICE     DEVICE DEVICE RUN )
> > +thread                             |       |
> > +                                   |     (page request)
> > +                                   |        \___
> > +                                   v            \
> > +listen thread:                     --- page -- page -- page -- page -- page --
> > +
> > +                                   a   b        c
> > +------------------------------------------------------------------------------
> > +
> > +On receipt of CMD_PACKAGED (1)
> > +   All the data associated with the package - the ( ... ) section in the
> > +diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
> > +recurses into qemu_loadvm_state_main to process the contents of the package (2)
> > +which contains commands (3,6) and devices (4...)
> > +
> > +On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
> > +a new thread (a) is started that takes over servicing the migration stream,
> > +while the main thread carries on loading the package.   It loads normal
> > +background page data (b) but if during a device load a fault happens (5) the
> > +returned page (c) is loaded by the listen thread allowing the main threads
> > +device load to carry on.
> > +
> > +The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
> > +CPUs start running.
> > +At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
> > +and is no longer used by migration, while the listen thread carries
> > +on servicing page data until the end of migration.
> > +
> > +=== Postcopy states ===
> > +
> > +Postcopy moves through a series of states (see postcopy_state) from
> > +ADVISE->LISTEN->RUNNING->END
> > +
> > +  Advise: Set at the start of migration if postcopy is enabled, even
> > +          if it hasn't had the start command; here the destination
> > +          checks that its OS has the support needed for postcopy, and performs
> > +          setup to ensure the RAM mappings are suitable for later postcopy.
> > +          (Triggered by reception of POSTCOPY_ADVISE command)
> 
> Adding:
> 
> "This gives the destination a chance to fail early if postcopy is not
> possible."
> 
> ?

I added:
 "The destination will fail early in migration at this point if the
  required OS support is not present.  "


> > +
> > +  Listen: The first command in the package, POSTCOPY_LISTEN, switches
> > +          the destination state to Listen, and starts a new thread
> > +          (the 'listen thread') which takes over the job of receiving
> > +          pages off the migration stream, while the main thread carries
> > +          on processing the blob.  With this thread able to process page
> > +          reception, the destination now 'sensitises' the RAM to detect
> > +          any access to missing pages (on Linux using the 'userfault'
> > +          system).
> > +
> > +  Running: POSTCOPY_RUN causes the destination to synchronise all
> > +          state and start the CPUs and IO devices running.  The main
> > +          thread now finishes processing the migration package and
> > +          now carries on as it would for normal precopy migration
> > +          (although it can't do the cleanup it would do as it
> > +          finishes a normal migration).
> 
> indentation went off a bit

Fixed.

Thanks,

Dave

> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request
  2015-07-14  9:18   ` Juan Quintela
@ 2015-08-06 10:45     ` Dr. David Alan Gilbert
  2015-10-20 10:29       ` Juan Quintela
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-06 10:45 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > On receiving MIG_RPCOMM_REQ_PAGES look up the address and
> > queue the page.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> 
> >      migrate_fd_cleanup_src_rp(s);
> >  
> > +    /* This queue generally should be empty - but in the case of a failed
> > +     * migration might have some droppings in.
> > +     */
> > +    struct MigrationSrcPageRequest *mspr, *next_mspr;
> > +    QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
> > +        QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);
> 
> How nice of QSIMPLEQ.  To remove elements you don't use mspr....

I guess I'm really just using the FOREACH as a while-not-empty.

> > +        g_free(mspr);
> > +    }
> > +
> >      if (s->file) {
> >          trace_migrate_fd_cleanup();
> >          qemu_mutex_unlock_iothread();
> > @@ -713,6 +729,8 @@ MigrationState *migrate_init(const MigrationParams *params)
> >      s->state = MIGRATION_STATUS_SETUP;
> >      trace_migrate_set_state(MIGRATION_STATUS_SETUP);
> >  
> > +    QSIMPLEQ_INIT(&s->src_page_requests);
> > +
> >      s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      return s;
> >  }
> > @@ -976,7 +994,25 @@ static void source_return_path_bad(MigrationState *s)
> >  static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
> >                                         ram_addr_t start, ram_addr_t len)
> >  {
> > +    long our_host_ps = getpagesize();
> > +
> >      trace_migrate_handle_rp_req_pages(rbname, start, len);
> > +
> > +    /*
> > +     * Since we currently insist on matching page sizes, just sanity check
> > +     * we're being asked for whole host pages.
> > +     */
> > +    if (start & (our_host_ps-1) ||
> > +       (len & (our_host_ps-1))) {
> 
> 
> I don't know if creating a macro is a good idea?
> #define HOST_ALIGN_CHECK(addr)  (addr & (getpagesize()-1))
> 
> ???
> 
> Don't we have a macro for this in qemu?

Not that I can find; include/exec/cpu-all.h has a HOST_PAGE_ALIGN macro,
but it realigns an address to the boundary rather than being a test.
cpu-all.h also exposes a bunch of globals (e.g. qemu_host_page_size and
qemu_host_page_mask) that would simplify it, but being in cpu-all.h
it means I can't use it here because it won't let me include it in
generic code.  I guess that needs moving out of cpu-all.h to somewhere
else.

> > index f7d957e..da3e9ea 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -924,6 +924,69 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
> >  }
> >  
> >  /**
> > + * Queue the pages for transmission, e.g. a request from postcopy destination
> > + *   ms: MigrationStatus in which the queue is held
> > + *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse last)
> > + *   start: Offset from the start of the RAMBlock
> > + *   len: Length (in bytes) to send
> > + *   Return: 0 on success
> > + */
> > +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> > +                         ram_addr_t start, ram_addr_t len)
> > +{
> > +    RAMBlock *ramblock;
> > +
> > +    rcu_read_lock();
> > +    if (!rbname) {
> > +        /* Reuse last RAMBlock */
> > +        ramblock = ms->last_req_rb;
> > +
> > +        if (!ramblock) {
> > +            /*
> > +             * Shouldn't happen, we can't reuse the last RAMBlock if
> > +             * it's the 1st request.
> > +             */
> > +            error_report("ram_save_queue_pages no previous block");
> > +            goto err;
> > +        }
> > +    } else {
> > +        ramblock = ram_find_block(rbname);
> > +
> > +        if (!ramblock) {
> > +            /* We shouldn't be asked for a non-existent RAMBlock */
> > +            error_report("ram_save_queue_pages no block '%s'", rbname);
> > +            goto err;
> > +        }
> 
>        Here?

Is that Here? The pointer for the next question?

> > +    }
> > +    trace_ram_save_queue_pages(ramblock->idstr, start, len);
> > +    if (start+len > ramblock->used_length) {
> > +        error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
> > +                     __func__, start, len, ramblock->used_length);
> > +        goto err;
> > +    }
> > +
> > +    struct MigrationSrcPageRequest *new_entry =
> > +        g_malloc0(sizeof(struct MigrationSrcPageRequest));
> > +    new_entry->rb = ramblock;
> > +    new_entry->offset = start;
> > +    new_entry->len = len;
> > +    ms->last_req_rb = ramblock;
> 
> Can we move this line to the else?

Done.

> > +
> > +    qemu_mutex_lock(&ms->src_page_req_mutex);
> > +    memory_region_ref(ramblock->mr);
> 
> I haven't looked further in the patch series yet, but I can't see on
> this patch a memory_region_unref ....  Don't we need it?

No; we take the ref when we put it into the queue, and unref it when
we take it out of the queue (which is in the later patch).  Actually,
that does mean I need to unref when I drain the queue in that QSIMPLEQ_FOREACH;
I'll fix that.

> > +    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> > +    qemu_mutex_unlock(&ms->src_page_req_mutex);
> > +    rcu_read_unlock();
> 
> Of everything that we have inside the rcu_read_lock() .... Is there
> anything else that the memory_region_ref() that needs rcu?

The ram_find_block_by_id also needs it.

> Would not be possible to do the memory reference before asking for the
> mutex?

Yes; I've swapped that round so it's:
    memory_region_ref(ramblock->mr);
    qemu_mutex_lock(&ms->src_page_req_mutex);
    QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
    qemu_mutex_unlock(&ms->src_page_req_mutex);
    rcu_read_unlock();

> Once here, do we care about calling malloc with the rcu set?  or could
> we just call malloc at the beggining of the function and free it in case
> that it is not needed on err?

Why would that be better?

Dave

> Thanks, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration
  2015-07-27  7:39   ` Amit Shah
@ 2015-08-06 11:22     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-06 11:22 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:47], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > In postcopy, the destination guest is running at the same time
> > as it's receiving pages; as we receive new pages we must put
> > them into the guests address space atomically to avoid a running
> > CPU accessing a partially written page.
> > 
> > Use the helpers in postcopy-ram.c to map these pages.
> > 
> > qemu_get_buffer_less_copy is used to avoid a copy out of qemu_file
> > in the case that postcopy is going to do a copy anyway.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> 
> > @@ -1881,6 +1890,16 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      int flags = 0, ret = 0;
> >      static uint64_t seq_iter;
> >      int len = 0;
> > +    /*
> > +     * System is running in postcopy mode, page inserts to host memory must be
> > +     * atomic
> > +     */
> 
> *If* system is running in postcopy mode ....

Done.

> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    bool postcopy_running = postcopy_state_get(mis) >=
> > +                            POSTCOPY_INCOMING_LISTENING;
> > +    void *postcopy_host_page = NULL;
> > +    bool postcopy_place_needed = false;
> > +    bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
> >  
> >      seq_iter++;
> >  
> > @@ -1896,13 +1915,57 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >      rcu_read_lock();
> >      while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> >          ram_addr_t addr, total_ram_bytes;
> > -        void *host;
> > +        void *host = 0;
> > +        void *page_buffer = 0;
> > +        void *postcopy_place_source = 0;
> >          uint8_t ch;
> > +        bool all_zero = false;
> >  
> >          addr = qemu_get_be64(f);
> >          flags = addr & ~TARGET_PAGE_MASK;
> >          addr &= TARGET_PAGE_MASK;
> >  
> > +        if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> > +                     RAM_SAVE_FLAG_XBZRLE)) {
> > +            host = host_from_stream_offset(f, mis, addr, flags);
> > +            if (!host) {
> > +                error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> > +                ret = -EINVAL;
> > +                break;
> > +            }
> 
> So the host_from_stream_offset was moved here from below.  One
> invocation below is still left, which is a bug..

Thanks, fixed.

> > +            if (!postcopy_running) {
> > +                page_buffer = host;
> > +            } else {
> 
> Instead of this, can we just do:
> 
> 	   page_buffer = host;
> 	   if (postcopy_running) {

done.

> > +                /*
> > +                 * Postcopy requires that we place whole host pages atomically.
> > +                 * To make it atomic, the data is read into a temporary page
> > +                 * that's moved into place later.
> > +                 * The migration protocol uses,  possibly smaller, target-pages
> > +                 * however the source ensures it always sends all the components
> > +                 * of a host page in order.
> > +                 */
> > +                if (!postcopy_host_page) {
> > +                    postcopy_host_page = postcopy_get_tmp_page(mis);
> > +                }
> > +                page_buffer = postcopy_host_page +
> > +                              ((uintptr_t)host & ~qemu_host_page_mask);
> > +                /* If all TP are zero then we can optimise the place */
> > +                if (!((uintptr_t)host & ~qemu_host_page_mask)) {
> > +                    all_zero = true;
> > +                }
> > +
> > +                /*
> > +                 * If it's the last part of a host page then we place the host
> > +                 * page
> > +                 */
> > +                postcopy_place_needed = (((uintptr_t)host + TARGET_PAGE_SIZE) &
> > +                                         ~qemu_host_page_mask) == 0;
> > +                postcopy_place_source = postcopy_host_page;
> > +            }
> > +        } else {
> > +            postcopy_place_needed = false;
> > +        }
> 
> ... and similar for postcopy_place_needed as well?  It becomes much
> easier to read.

Done; actually it's just not needed at all - the function entry initialisation
of that flag is sufficient.

> >          case RAM_SAVE_FLAG_COMPRESS_PAGE:
> > -            host = host_from_stream_offset(f, addr, flags);
> > +            all_zero = false;
> > +            if (postcopy_running) {
> > +                error_report("Compressed RAM in postcopy mode @%zx\n", addr);
> > +                return -EINVAL;
> > +            }
> > +            host = host_from_stream_offset(f, mis, addr, flags);
> 
> This line should go (as mentioned above)?

Yes, done.

Dave

> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source
  2015-07-01  9:29       ` Juan Quintela
@ 2015-08-06 12:18         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-06 12:18 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> >> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> >> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> >
> >> > Add migrate_send_rp_message to send a message from destination to source along the return path.
> >> >   (It uses a mutex to let it be called from multiple threads)
> >> > Add migrate_send_rp_shut to send a 'shut' message to indicate
> >> >   the destination is finished with the RP.
> >> > Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
> >> >   Use it in the MSG_RP_PING handler
> >> >
> >> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> > ---
> >> >  include/migration/migration.h | 17 ++++++++++++++++
> >> >  migration/migration.c         | 45 +++++++++++++++++++++++++++++++++++++++++++
> >> >  migration/savevm.c            |  2 +-
> >> >  trace-events                  |  1 +
> >> >  4 files changed, 64 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> >> > index 65fe5db..36caab9 100644
> >> > --- a/include/migration/migration.h
> >> > +++ b/include/migration/migration.h
> >> > @@ -42,12 +42,20 @@ struct MigrationParams {
> >> >      bool shared;
> >> >  };
> >> >  
> >> > +/* Messages sent on the return path from destination to source */
> >> > +enum mig_rp_message_type {
> >> > +    MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
> >> > +    MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
> >> > +    MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
> >> > +};
> >> > +
> >> >  typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
> >> >  /* State for the incoming migration */
> >> >  struct MigrationIncomingState {
> >> >      QEMUFile *file;
> >> >  
> >> >      QEMUFile *return_path;
> >> > +    QemuMutex      rp_mutex;    /* We send replies from multiple threads */
> >> >  
> >> >      /* See savevm.c */
> >> >      LoadStateEntry_Head loadvm_handlers;
> >> > @@ -179,6 +187,15 @@ int migrate_compress_level(void);
> >> >  int migrate_compress_threads(void);
> >> >  int migrate_decompress_threads(void);
> >> >  
> >> > +/* Sending on the return path - generic and then for each message type */
> >> > +void migrate_send_rp_message(MigrationIncomingState *mis,
> >> > +                             enum mig_rp_message_type message_type,
> >> > +                             uint16_t len, void *data);
> >> > +void migrate_send_rp_shut(MigrationIncomingState *mis,
> >> > +                          uint32_t value);
> >> > +void migrate_send_rp_pong(MigrationIncomingState *mis,
> >> > +                          uint32_t value);
> >> > +
> >> >  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
> >> >  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> >> >  void ram_control_load_hook(QEMUFile *f, uint64_t flags);
> >> > diff --git a/migration/migration.c b/migration/migration.c
> >> > index 295f15a..afb19a1 100644
> >> > --- a/migration/migration.c
> >> > +++ b/migration/migration.c
> >> > @@ -85,6 +85,7 @@ MigrationIncomingState *migration_incoming_state_new(QEMUFile* f)
> >> >      mis_current = g_malloc0(sizeof(MigrationIncomingState));
> >> >      mis_current->file = f;
> >> >      QLIST_INIT(&mis_current->loadvm_handlers);
> >> > +    qemu_mutex_init(&mis_current->rp_mutex);
> >> >  
> >> >      return mis_current;
> >> >  }
> >> > @@ -182,6 +183,50 @@ void process_incoming_migration(QEMUFile *f)
> >> >      qemu_coroutine_enter(co, f);
> >> >  }
> >> >  
> >> > +/*
> >> > + * Send a message on the return channel back to the source
> >> > + * of the migration.
> >> > + */
> >> > +void migrate_send_rp_message(MigrationIncomingState *mis,
> >> > +                             enum mig_rp_message_type message_type,
> >> > +                             uint16_t len, void *data)
> >> > +{
> >> > +    trace_migrate_send_rp_message((int)message_type, len);
> >> > +    qemu_mutex_lock(&mis->rp_mutex);
> >> > +    qemu_put_be16(mis->return_path, (unsigned int)message_type);
> >> > +    qemu_put_be16(mis->return_path, len);
> >> if (len) {
> >> 
> >> > +    qemu_put_buffer(mis->return_path, data, len);
> >> }
> >> 
> >> 
> >> ?
> >> 
> >> We check for zero sized command on control commands but not on
> >> responses?
> >
> > Or should I remove the check in the control commands case?
> > qemu_put_buffer looks like it's safe for size == 0
> 
> I would go for this just for consistence?

OK, check removed in both cases.

Dave

(Resend: I dropped the list off the first message)
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-07-13 13:24   ` Juan Quintela
@ 2015-08-06 14:15     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-06 14:15 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
> > destination to request a page from the source.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |  4 +++
> >  migration/migration.c         | 70 +++++++++++++++++++++++++++++++++++++++++++
> >  trace-events                  |  1 +
> >  3 files changed, 75 insertions(+)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 68a1731..8742d53 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -47,6 +47,8 @@ enum mig_rp_message_type {
> >      MIG_RP_MSG_INVALID = 0,  /* Must be 0 */
> >      MIG_RP_MSG_SHUT,         /* sibling will not send any more RP messages */
> >      MIG_RP_MSG_PONG,         /* Response to a PING; data (seq: be32 ) */
> > +
> > +    MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be64) */
> 
> Not that I really care, buht I think that leng could be 32bits.  I am
> not seing networking getting good at multigigabytes transfers soon O:-)

Done.

> > +void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
> > +                              ram_addr_t start, ram_addr_t len);
> 
> Shouldn't len be a size_t?
> (yes, I know that migration code is not really consistent about that)

Done (fun combination with the change above, but still)

> >  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
> >  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 3e5a7c8..0373b77 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -113,6 +113,36 @@ static void deferred_incoming_migration(Error **errp)
> >      deferred_incoming = true;
> >  }
> >  
> > +/* Request a range of pages from the source VM at the given
> > + * start address.
> > + *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
> > + *           as the last request (a name must have been given previously)
> > + *   Start: Address offset within the RB
> > + *   Len: Length in bytes required - must be a multiple of pagesize
> > + */
> > +void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
> > +                               ram_addr_t start, ram_addr_t len)
> > +{
> > +    uint8_t bufc[16+1+255]; /* start (8 byte), len (8 byte), rbname upto 256 */
> > +    uint64_t *buf64 = (uint64_t *)bufc;
> > +    size_t msglen = 16; /* start + len */
> > +
> > +    assert(!(len & 1));
> 
> ohhhh, why can't we get a real flags field?
> 
> Scratch that.  Seeing the rest of the code, can't we have two commands:
> 
> MIG_RP_MSG_REQ_PAGES
> MIG_RP_MSG_REQ_PAGES_WITH_ID
> 
> I am not really sure that it makes sense getting a command that can be
> of two different lengths only for that?
> 
> I am not sure, but having a command with two different payloads look
> strange.

Done (I made it _ID rather than WITH_ID - it was getting a bit long).

Dave

> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  2015-07-23  6:50   ` Amit Shah
@ 2015-08-06 14:21     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-06 14:21 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:43], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
> > destination to request a page from the source.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -113,6 +113,36 @@ static void deferred_incoming_migration(Error **errp)
> >      deferred_incoming = true;
> >  }
> >  
> > +/* Request a range of pages from the source VM at the given
> > + * start address.
> > + *   rbname: Name of the RAMBlock to request the page in, if NULL it's the same
> > + *           as the last request (a name must have been given previously)
> 
> Why not just send the name all the time?

It does shrink the messages quite a bit, and you have to do a name lookup
on them when you receive it, rather than just knowing it's the same one.

> > @@ -1010,6 +1058,28 @@ static void *source_return_path_thread(void *opaque)
> >              trace_source_return_path_thread_pong(tmp32);
> >              break;
> >  
> > +        case MIG_RP_MSG_REQ_PAGES:
> > +            start = be64_to_cpup((uint64_t *)buf);
> > +            len = be64_to_cpup(((uint64_t *)buf)+1);
> > +            tmpstr = NULL;
> > +            if (len & 1) {
> > +                len -= 1; /* Remove the flag */
> > +                /* Now we expect an idstr */
> > +                tmp32 = buf[16]; /* Length of the following idstr */
> > +                tmpstr = (char *)&buf[17];
> > +                buf[17+tmp32] = '\0';
> > +                expected_len = 16+1+tmp32;
> 
> Whitespace missing around operators

Done.

> > +            } else {
> > +                expected_len = 16;
> > +            }
> 
> This else can be removed if expected_len is set before the if

That's simplified out with the change Juan suggested.

Dave

> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy
  2015-07-28 11:32       ` Juan Quintela
@ 2015-08-06 14:55         ` Dr. David Alan Gilbert
  2015-08-07  3:05           ` zhanghailiang
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-06 14:55 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, zhang.zhanghailiang, liang.z.li, qemu-devel,
	luis, Amit Shah, pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> Amit Shah <amit.shah@redhat.com> wrote:
> > On (Tue) 14 Jul 2015 [17:22:13], Juan Quintela wrote:
> >> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> >
> >> > +    if (enable_mlock) {
> >> > +        if (os_mlock() < 0) {
> >> > +            error_report("mlock: %s", strerror(errno));
> >> > +            /*
> >> > +             * It doesn't feel right to fail at this point, we have a valid
> >> > +             * VM state.
> >> > +             */
> >> 
> >> realtime_init() exit in case of os_mlock() fails, so current code is:
> >
> > Yea, I was wondering the same - but then I thought: would the realtime
> > case want a migration to happen at all?
> 
> Then disable migration with realtime looks like saner.  But that
> decission don't belong to this series.

I added this patch because Zhanghailiang had reported trying to use it and it
failing.

Zhanghailiang: Do you have a use case for mlock=on and migration?

Dave

> 
> >
> >> - we start qemu with mlock requset
> >> - we mlock memory
> >> - we start postcopy
> >> - we munlock memory
> >> - we mlock memory
> >> 
> >> I wmill really, really preffer having a check if memory is mlocked, and
> >> it that case, just abort migration altogether.  Or better still, wait to
> >> enable mlock *until* we have finished postcopy, no?
> >
> > 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy
  2015-08-06 14:55         ` Dr. David Alan Gilbert
@ 2015-08-07  3:05           ` zhanghailiang
  0 siblings, 0 replies; 209+ messages in thread
From: zhanghailiang @ 2015-08-07  3:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, peter.huangpeng, qemu-devel, luis,
	Amit Shah, pbonzini, david

On 2015/8/6 22:55, Dr. David Alan Gilbert wrote:
> * Juan Quintela (quintela@redhat.com) wrote:
>> Amit Shah <amit.shah@redhat.com> wrote:
>>> On (Tue) 14 Jul 2015 [17:22:13], Juan Quintela wrote:
>>>> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
>>>
>>>>> +    if (enable_mlock) {
>>>>> +        if (os_mlock() < 0) {
>>>>> +            error_report("mlock: %s", strerror(errno));
>>>>> +            /*
>>>>> +             * It doesn't feel right to fail at this point, we have a valid
>>>>> +             * VM state.
>>>>> +             */
>>>>
>>>> realtime_init() exit in case of os_mlock() fails, so current code is:
>>>
>>> Yea, I was wondering the same - but then I thought: would the realtime
>>> case want a migration to happen at all?
>>
>> Then disable migration with realtime looks like saner.  But that
>> decission don't belong to this series.
>
> I added this patch because Zhanghailiang had reported trying to use it and it
> failing.
>
> Zhanghailiang: Do you have a use case for mlock=on and migration?
>

Yes, we usually configure mlock=on for VM that needs high performance, and we also support migration for these VMs.
It works well for pre-copy migration with this configuration.
IMHO, it is better to support this for post-copy migration too~

(Or maybe we could add dynamically mlock/munlock memory command for qemu,
and let management layer to decide what to do if they want to post-copy VM with mlocking memory ?).


>>
>>>
>>>> - we start qemu with mlock requset
>>>> - we mlock memory
>>>> - we start postcopy
>>>> - we munlock memory
>>>> - we mlock memory
>>>>
>>>> I wmill really, really preffer having a check if memory is mlocked, and
>>>> it that case, just abort migration altogether.  Or better still, wait to
>>>> enable mlock *until* we have finished postcopy, no?
>>>
>>> 		Amit
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
  2015-07-13 10:29   ` Juan Quintela
@ 2015-08-18 10:23     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-18 10:23 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Open a return path, and handle messages that are received upon it.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > +/*
> > + * Handles messages sent on the return path towards the source VM
> > + *
> > + */
> > +static void *source_return_path_thread(void *opaque)
> > +{
> > +    MigrationState *ms = opaque;
> > +    QEMUFile *rp = ms->rp_state.file;
> > +    uint16_t expected_len, header_len, header_type;
> > +    const int max_len = 512;
> > +    uint8_t buf[max_len];
> > +    uint32_t tmp32;
> > +    int res;
> > +
> > +    trace_source_return_path_thread_entry();
> > +    while (rp && !qemu_file_get_error(rp) &&
> 
> What can make rp == NULL?
> THinking about that, could you mean *rp here?

I've reworked this;  it was meant to catch the case of the rp being
closed early, but was racy.

I've now got:
    while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
           migration_already_active(ms)) {

and now the rp qemu_file gets closed at the end of the thread,
anyone else that wants the rp_thread to exit sets the error flag.

> > +        migration_already_active(ms)) {
> > +        trace_source_return_path_thread_loop_top();
> > +        header_type = qemu_get_be16(rp);
> > +        header_len = qemu_get_be16(rp);
> > +
> > +        switch (header_type) {
> > +        case MIG_RP_MSG_SHUT:
> > +        case MIG_RP_MSG_PONG:
> > +            expected_len = 4;
> > +            break;
> > +
> > +        default:
> > +            error_report("RP: Received invalid message 0x%04x length 0x%04x",
> > +                    header_type, header_len);
> > +            source_return_path_bad(ms);
> > +            goto out;
> > +        }
> >  
> > +        if (header_len > expected_len) {
> > +            error_report("RP: Received message 0x%04x with"
> > +                    "incorrect length %d expecting %d",
> > +                    header_type, header_len,
> > +                    expected_len);
> 
> I know this is a big request, but getting an array with messages length
> and message names to be able to print nice error messages looks ilke good?

Done; (same way as for the commands on the forward path).

> > +            source_return_path_bad(ms);
> > +            goto out;
> > +        }
> > +
> > +        /* We know we've got a valid header by this point */
> > +        res = qemu_get_buffer(rp, buf, header_len);
> > +        if (res != header_len) {
> > +            trace_source_return_path_thread_failed_read_cmd_data();
> > +            source_return_path_bad(ms);
> > +            goto out;
> > +        }
> > +
> > +        /* OK, we have the message and the data */
> > +        switch (header_type) {
> > +        case MIG_RP_MSG_SHUT:
> > +            tmp32 = be32_to_cpup((uint32_t *)buf);
> 
> make local variable and call it sibling_error or whatever you like?

Done.

> > +            trace_source_return_path_thread_shut(tmp32);
> > +            if (tmp32) {
> > +                error_report("RP: Sibling indicated error %d", tmp32);
> > +                source_return_path_bad(ms);
> > +            }
> > +            /*
> > +             * We'll let the main thread deal with closing the RP
> > +             * we could do a shutdown(2) on it, but we're the only user
> > +             * anyway, so there's nothing gained.
> > +             */
> > +            goto out;
> > +
> > +        case MIG_RP_MSG_PONG:
> > +            tmp32 = be32_to_cpup((uint32_t *)buf);
> 
> unused?
> Althought I guess it is used somewhere to make sure that the value is
> the same that whatever we did the ping.  credentials?
> 
> I can't see with this and previous patch what value is sent here.

I'm not using the ping/pong messages for anything active, they exist
primarily as a debug and tracing aid.

> > +            trace_source_return_path_thread_pong(tmp32);
> > +            break;
> > +
> > +        default:
> > +            break;
> > +        }
> > +    }
> > +    if (rp && qemu_file_get_error(rp)) {
> > +        trace_source_return_path_thread_bad_end();
> > +        source_return_path_bad(ms);
> > +    }
> > +
> > +    trace_source_return_path_thread_end();
> > +out:
> > +    return NULL;
> > +}
> > +
> > +__attribute__ (( unused )) /* Until later in patch series */
> 
> unused_by_know attribute required O:-)

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
  2015-08-05  8:06   ` zhanghailiang
@ 2015-08-18 10:45     ` Dr. David Alan Gilbert
  2015-08-18 11:29       ` zhanghailiang
  0 siblings, 1 reply; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-18 10:45 UTC (permalink / raw)
  To: zhanghailiang
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel,
	peter.huangpeng, luis, amit.shah, pbonzini, david

* zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
> Hi Dave,
> 
> On 2015/6/16 18:26, Dr. David Alan Gilbert (git) wrote:
> >From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> >Open a return path, and handle messages that are received upon it.
> >
> >Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >---
> >  include/migration/migration.h |   8 ++
> >  migration/migration.c         | 177 +++++++++++++++++++++++++++++++++++++++++-
> >  trace-events                  |  12 +++
> >  3 files changed, 196 insertions(+), 1 deletion(-)
> >
> >diff --git a/include/migration/migration.h b/include/migration/migration.h
> >index 36caab9..868f59a 100644
> >--- a/include/migration/migration.h
> >+++ b/include/migration/migration.h
> >@@ -77,6 +77,14 @@ struct MigrationState
> >
> >      int state;
> >      MigrationParams params;
> >+
> >+    /* State related to return path */
> >+    struct {
> >+        QEMUFile     *file;
> 
> There is already a 'file' member in MigrationState,
> and since for migration, there is only one path direction, just from source side
> to destination side, so it is ok to use that name.
> 
> But for post-copy and COLO, we need two-way communication,
> So we can rename the original 'file' member of MigrationState to 'ouput_file',
> and add a new 'input_file' member. For MigrationIncomingState struct, rename its original
> 'file' member to 'input_file',and add a new 'output_file'.
> IMHO, this will make things more clear.

Would the following be clearer:

  On the source make the existing migration file:
       QEMUFile  *to_dst_file;
  and for the return path
       QEMUFile  *from_dst_dile;

  and then on the destination, the incoming migration stream:
       QEMUFile  *from_src_file;
  and then the return path on the destionation:
       QEMUFile  *to_src_file;

Dave

> Thanks,
> zhanghailiang
> 
> 
> >+        QemuThread    rp_thread;
> >+        bool          error;
> >+    } rp_state;
> >+
> >      double mbps;
> >      int64_t total_time;
> >      int64_t downtime;
> >diff --git a/migration/migration.c b/migration/migration.c
> >index afb19a1..fb2f491 100644
> >--- a/migration/migration.c
> >+++ b/migration/migration.c
> >@@ -278,6 +278,23 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> >      return params;
> >  }
> >
> >+/*
> >+ * Return true if we're already in the middle of a migration
> >+ * (i.e. any of the active or setup states)
> >+ */
> >+static bool migration_already_active(MigrationState *ms)
> >+{
> >+    switch (ms->state) {
> >+    case MIGRATION_STATUS_ACTIVE:
> >+    case MIGRATION_STATUS_SETUP:
> >+        return true;
> >+
> >+    default:
> >+        return false;
> >+
> >+    }
> >+}
> >+
> >  static void get_xbzrle_cache_stats(MigrationInfo *info)
> >  {
> >      if (migrate_use_xbzrle()) {
> >@@ -441,6 +458,21 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
> >      }
> >  }
> >
> >+static void migrate_fd_cleanup_src_rp(MigrationState *ms)
> >+{
> >+    QEMUFile *rp = ms->rp_state.file;
> >+
> >+    /*
> >+     * When stuff goes wrong (e.g. failing destination) on the rp, it can get
> >+     * cleaned up from a few threads; make sure not to do it twice in parallel
> >+     */
> >+    rp = atomic_cmpxchg(&ms->rp_state.file, rp, NULL);
> >+    if (rp) {
> >+        trace_migrate_fd_cleanup_src_rp();
> >+        qemu_fclose(rp);
> >+    }
> >+}
> >+
> >  static void migrate_fd_cleanup(void *opaque)
> >  {
> >      MigrationState *s = opaque;
> >@@ -448,6 +480,8 @@ static void migrate_fd_cleanup(void *opaque)
> >      qemu_bh_delete(s->cleanup_bh);
> >      s->cleanup_bh = NULL;
> >
> >+    migrate_fd_cleanup_src_rp(s);
> >+
> >      if (s->file) {
> >          trace_migrate_fd_cleanup();
> >          qemu_mutex_unlock_iothread();
> >@@ -487,6 +521,11 @@ static void migrate_fd_cancel(MigrationState *s)
> >      QEMUFile *f = migrate_get_current()->file;
> >      trace_migrate_fd_cancel();
> >
> >+    if (s->rp_state.file) {
> >+        /* shutdown the rp socket, so causing the rp thread to shutdown */
> >+        qemu_file_shutdown(s->rp_state.file);
> >+    }
> >+
> >      do {
> >          old_state = s->state;
> >          if (old_state != MIGRATION_STATUS_SETUP &&
> >@@ -801,8 +840,144 @@ int64_t migrate_xbzrle_cache_size(void)
> >      return s->xbzrle_cache_size;
> >  }
> >
> >-/* migration thread support */
> >+/*
> >+ * Something bad happened to the RP stream, mark an error
> >+ * The caller shall print something to indicate why
> >+ */
> >+static void source_return_path_bad(MigrationState *s)
> >+{
> >+    s->rp_state.error = true;
> >+    migrate_fd_cleanup_src_rp(s);
> >+}
> >+
> >+/*
> >+ * Handles messages sent on the return path towards the source VM
> >+ *
> >+ */
> >+static void *source_return_path_thread(void *opaque)
> >+{
> >+    MigrationState *ms = opaque;
> >+    QEMUFile *rp = ms->rp_state.file;
> >+    uint16_t expected_len, header_len, header_type;
> >+    const int max_len = 512;
> >+    uint8_t buf[max_len];
> >+    uint32_t tmp32;
> >+    int res;
> >+
> >+    trace_source_return_path_thread_entry();
> >+    while (rp && !qemu_file_get_error(rp) &&
> >+        migration_already_active(ms)) {
> >+        trace_source_return_path_thread_loop_top();
> >+        header_type = qemu_get_be16(rp);
> >+        header_len = qemu_get_be16(rp);
> >+
> >+        switch (header_type) {
> >+        case MIG_RP_MSG_SHUT:
> >+        case MIG_RP_MSG_PONG:
> >+            expected_len = 4;
> >+            break;
> >+
> >+        default:
> >+            error_report("RP: Received invalid message 0x%04x length 0x%04x",
> >+                    header_type, header_len);
> >+            source_return_path_bad(ms);
> >+            goto out;
> >+        }
> >
> >+        if (header_len > expected_len) {
> >+            error_report("RP: Received message 0x%04x with"
> >+                    "incorrect length %d expecting %d",
> >+                    header_type, header_len,
> >+                    expected_len);
> >+            source_return_path_bad(ms);
> >+            goto out;
> >+        }
> >+
> >+        /* We know we've got a valid header by this point */
> >+        res = qemu_get_buffer(rp, buf, header_len);
> >+        if (res != header_len) {
> >+            trace_source_return_path_thread_failed_read_cmd_data();
> >+            source_return_path_bad(ms);
> >+            goto out;
> >+        }
> >+
> >+        /* OK, we have the message and the data */
> >+        switch (header_type) {
> >+        case MIG_RP_MSG_SHUT:
> >+            tmp32 = be32_to_cpup((uint32_t *)buf);
> >+            trace_source_return_path_thread_shut(tmp32);
> >+            if (tmp32) {
> >+                error_report("RP: Sibling indicated error %d", tmp32);
> >+                source_return_path_bad(ms);
> >+            }
> >+            /*
> >+             * We'll let the main thread deal with closing the RP
> >+             * we could do a shutdown(2) on it, but we're the only user
> >+             * anyway, so there's nothing gained.
> >+             */
> >+            goto out;
> >+
> >+        case MIG_RP_MSG_PONG:
> >+            tmp32 = be32_to_cpup((uint32_t *)buf);
> >+            trace_source_return_path_thread_pong(tmp32);
> >+            break;
> >+
> >+        default:
> >+            break;
> >+        }
> >+    }
> >+    if (rp && qemu_file_get_error(rp)) {
> >+        trace_source_return_path_thread_bad_end();
> >+        source_return_path_bad(ms);
> >+    }
> >+
> >+    trace_source_return_path_thread_end();
> >+out:
> >+    return NULL;
> >+}
> >+
> >+__attribute__ (( unused )) /* Until later in patch series */
> >+static int open_return_path_on_source(MigrationState *ms)
> >+{
> >+
> >+    ms->rp_state.file = qemu_file_get_return_path(ms->file);
> >+    if (!ms->rp_state.file) {
> >+        return -1;
> >+    }
> >+
> >+    trace_open_return_path_on_source();
> >+    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
> >+                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
> >+
> >+    trace_open_return_path_on_source_continue();
> >+
> >+    return 0;
> >+}
> >+
> >+__attribute__ (( unused )) /* Until later in patch series */
> >+/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
> >+static int await_return_path_close_on_source(MigrationState *ms)
> >+{
> >+    /*
> >+     * If this is a normal exit then the destination will send a SHUT and the
> >+     * rp_thread will exit, however if there's an error we need to cause
> >+     * it to exit, which we can do by a shutdown.
> >+     * (canceling must also shutdown to stop us getting stuck here if
> >+     * the destination died at just the wrong place)
> >+     */
> >+    if (qemu_file_get_error(ms->file) && ms->rp_state.file) {
> >+        qemu_file_shutdown(ms->rp_state.file);
> >+    }
> >+    trace_await_return_path_close_on_source_joining();
> >+    qemu_thread_join(&ms->rp_state.rp_thread);
> >+    trace_await_return_path_close_on_source_close();
> >+    return ms->rp_state.error;
> >+}
> >+
> >+/*
> >+ * Master migration thread on the source VM.
> >+ * It drives the migration and pumps the data down the outgoing channel.
> >+ */
> >  static void *migration_thread(void *opaque)
> >  {
> >      MigrationState *s = opaque;
> >diff --git a/trace-events b/trace-events
> >index 5738e3f..282cde1 100644
> >--- a/trace-events
> >+++ b/trace-events
> >@@ -1394,12 +1394,24 @@ flic_no_device_api(int err) "flic: no Device Contral API support %d"
> >  flic_reset_failed(int err) "flic: reset failed %d"
> >
> >  # migration.c
> >+await_return_path_close_on_source_close(void) ""
> >+await_return_path_close_on_source_joining(void) ""
> >  migrate_set_state(int new_state) "new state %d"
> >  migrate_fd_cleanup(void) ""
> >+migrate_fd_cleanup_src_rp(void) ""
> >  migrate_fd_error(void) ""
> >  migrate_fd_cancel(void) ""
> >  migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
> >  migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
> >+open_return_path_on_source(void) ""
> >+open_return_path_on_source_continue(void) ""
> >+source_return_path_thread_bad_end(void) ""
> >+source_return_path_thread_end(void) ""
> >+source_return_path_thread_entry(void) ""
> >+source_return_path_thread_failed_read_cmd_data(void) ""
> >+source_return_path_thread_loop_top(void) ""
> >+source_return_path_thread_pong(uint32_t val) "%x"
> >+source_return_path_thread_shut(uint32_t val) "%x"
> >  migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
> >
> >  # migration/rdma.c
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path
  2015-08-18 10:45     ` Dr. David Alan Gilbert
@ 2015-08-18 11:29       ` zhanghailiang
  0 siblings, 0 replies; 209+ messages in thread
From: zhanghailiang @ 2015-08-18 11:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, quintela, liang.z.li, peter.huangpeng,
	qemu-devel, luis, amit.shah, pbonzini, david

On 2015/8/18 18:45, Dr. David Alan Gilbert wrote:
> * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote:
>> Hi Dave,
>>
>> On 2015/6/16 18:26, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> Open a return path, and handle messages that are received upon it.
>>>
>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> ---
>>>   include/migration/migration.h |   8 ++
>>>   migration/migration.c         | 177 +++++++++++++++++++++++++++++++++++++++++-
>>>   trace-events                  |  12 +++
>>>   3 files changed, 196 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/migration/migration.h b/include/migration/migration.h
>>> index 36caab9..868f59a 100644
>>> --- a/include/migration/migration.h
>>> +++ b/include/migration/migration.h
>>> @@ -77,6 +77,14 @@ struct MigrationState
>>>
>>>       int state;
>>>       MigrationParams params;
>>> +
>>> +    /* State related to return path */
>>> +    struct {
>>> +        QEMUFile     *file;
>>
>> There is already a 'file' member in MigrationState,
>> and since for migration, there is only one path direction, just from source side
>> to destination side, so it is ok to use that name.
>>
>> But for post-copy and COLO, we need two-way communication,
>> So we can rename the original 'file' member of MigrationState to 'ouput_file',
>> and add a new 'input_file' member. For MigrationIncomingState struct, rename its original
>> 'file' member to 'input_file',and add a new 'output_file'.
>> IMHO, this will make things more clear.
>
> Would the following be clearer:
>

Yes, it is clearer and  more graceful :)

>    On the source make the existing migration file:
>         QEMUFile  *to_dst_file;
>    and for the return path
>         QEMUFile  *from_dst_dile;
>                             ^
                      from_dst_file

>    and then on the destination, the incoming migration stream:
>         QEMUFile  *from_src_file;
>    and then the return path on the destionation:
>         QEMUFile  *to_src_file;
>
> Dave
>
>> Thanks,
>> zhanghailiang
>>
>>
>>> +        QemuThread    rp_thread;
>>> +        bool          error;
>>> +    } rp_state;
>>> +
>>>       double mbps;
>>>       int64_t total_time;
>>>       int64_t downtime;
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index afb19a1..fb2f491 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -278,6 +278,23 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>>>       return params;
>>>   }
>>>
>>> +/*
>>> + * Return true if we're already in the middle of a migration
>>> + * (i.e. any of the active or setup states)
>>> + */
>>> +static bool migration_already_active(MigrationState *ms)
>>> +{
>>> +    switch (ms->state) {
>>> +    case MIGRATION_STATUS_ACTIVE:
>>> +    case MIGRATION_STATUS_SETUP:
>>> +        return true;
>>> +
>>> +    default:
>>> +        return false;
>>> +
>>> +    }
>>> +}
>>> +
>>>   static void get_xbzrle_cache_stats(MigrationInfo *info)
>>>   {
>>>       if (migrate_use_xbzrle()) {
>>> @@ -441,6 +458,21 @@ static void migrate_set_state(MigrationState *s, int old_state, int new_state)
>>>       }
>>>   }
>>>
>>> +static void migrate_fd_cleanup_src_rp(MigrationState *ms)
>>> +{
>>> +    QEMUFile *rp = ms->rp_state.file;
>>> +
>>> +    /*
>>> +     * When stuff goes wrong (e.g. failing destination) on the rp, it can get
>>> +     * cleaned up from a few threads; make sure not to do it twice in parallel
>>> +     */
>>> +    rp = atomic_cmpxchg(&ms->rp_state.file, rp, NULL);
>>> +    if (rp) {
>>> +        trace_migrate_fd_cleanup_src_rp();
>>> +        qemu_fclose(rp);
>>> +    }
>>> +}
>>> +
>>>   static void migrate_fd_cleanup(void *opaque)
>>>   {
>>>       MigrationState *s = opaque;
>>> @@ -448,6 +480,8 @@ static void migrate_fd_cleanup(void *opaque)
>>>       qemu_bh_delete(s->cleanup_bh);
>>>       s->cleanup_bh = NULL;
>>>
>>> +    migrate_fd_cleanup_src_rp(s);
>>> +
>>>       if (s->file) {
>>>           trace_migrate_fd_cleanup();
>>>           qemu_mutex_unlock_iothread();
>>> @@ -487,6 +521,11 @@ static void migrate_fd_cancel(MigrationState *s)
>>>       QEMUFile *f = migrate_get_current()->file;
>>>       trace_migrate_fd_cancel();
>>>
>>> +    if (s->rp_state.file) {
>>> +        /* shutdown the rp socket, so causing the rp thread to shutdown */
>>> +        qemu_file_shutdown(s->rp_state.file);
>>> +    }
>>> +
>>>       do {
>>>           old_state = s->state;
>>>           if (old_state != MIGRATION_STATUS_SETUP &&
>>> @@ -801,8 +840,144 @@ int64_t migrate_xbzrle_cache_size(void)
>>>       return s->xbzrle_cache_size;
>>>   }
>>>
>>> -/* migration thread support */
>>> +/*
>>> + * Something bad happened to the RP stream, mark an error
>>> + * The caller shall print something to indicate why
>>> + */
>>> +static void source_return_path_bad(MigrationState *s)
>>> +{
>>> +    s->rp_state.error = true;
>>> +    migrate_fd_cleanup_src_rp(s);
>>> +}
>>> +
>>> +/*
>>> + * Handles messages sent on the return path towards the source VM
>>> + *
>>> + */
>>> +static void *source_return_path_thread(void *opaque)
>>> +{
>>> +    MigrationState *ms = opaque;
>>> +    QEMUFile *rp = ms->rp_state.file;
>>> +    uint16_t expected_len, header_len, header_type;
>>> +    const int max_len = 512;
>>> +    uint8_t buf[max_len];
>>> +    uint32_t tmp32;
>>> +    int res;
>>> +
>>> +    trace_source_return_path_thread_entry();
>>> +    while (rp && !qemu_file_get_error(rp) &&
>>> +        migration_already_active(ms)) {
>>> +        trace_source_return_path_thread_loop_top();
>>> +        header_type = qemu_get_be16(rp);
>>> +        header_len = qemu_get_be16(rp);
>>> +
>>> +        switch (header_type) {
>>> +        case MIG_RP_MSG_SHUT:
>>> +        case MIG_RP_MSG_PONG:
>>> +            expected_len = 4;
>>> +            break;
>>> +
>>> +        default:
>>> +            error_report("RP: Received invalid message 0x%04x length 0x%04x",
>>> +                    header_type, header_len);
>>> +            source_return_path_bad(ms);
>>> +            goto out;
>>> +        }
>>>
>>> +        if (header_len > expected_len) {
>>> +            error_report("RP: Received message 0x%04x with"
>>> +                    "incorrect length %d expecting %d",
>>> +                    header_type, header_len,
>>> +                    expected_len);
>>> +            source_return_path_bad(ms);
>>> +            goto out;
>>> +        }
>>> +
>>> +        /* We know we've got a valid header by this point */
>>> +        res = qemu_get_buffer(rp, buf, header_len);
>>> +        if (res != header_len) {
>>> +            trace_source_return_path_thread_failed_read_cmd_data();
>>> +            source_return_path_bad(ms);
>>> +            goto out;
>>> +        }
>>> +
>>> +        /* OK, we have the message and the data */
>>> +        switch (header_type) {
>>> +        case MIG_RP_MSG_SHUT:
>>> +            tmp32 = be32_to_cpup((uint32_t *)buf);
>>> +            trace_source_return_path_thread_shut(tmp32);
>>> +            if (tmp32) {
>>> +                error_report("RP: Sibling indicated error %d", tmp32);
>>> +                source_return_path_bad(ms);
>>> +            }
>>> +            /*
>>> +             * We'll let the main thread deal with closing the RP
>>> +             * we could do a shutdown(2) on it, but we're the only user
>>> +             * anyway, so there's nothing gained.
>>> +             */
>>> +            goto out;
>>> +
>>> +        case MIG_RP_MSG_PONG:
>>> +            tmp32 = be32_to_cpup((uint32_t *)buf);
>>> +            trace_source_return_path_thread_pong(tmp32);
>>> +            break;
>>> +
>>> +        default:
>>> +            break;
>>> +        }
>>> +    }
>>> +    if (rp && qemu_file_get_error(rp)) {
>>> +        trace_source_return_path_thread_bad_end();
>>> +        source_return_path_bad(ms);
>>> +    }
>>> +
>>> +    trace_source_return_path_thread_end();
>>> +out:
>>> +    return NULL;
>>> +}
>>> +
>>> +__attribute__ (( unused )) /* Until later in patch series */
>>> +static int open_return_path_on_source(MigrationState *ms)
>>> +{
>>> +
>>> +    ms->rp_state.file = qemu_file_get_return_path(ms->file);
>>> +    if (!ms->rp_state.file) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    trace_open_return_path_on_source();
>>> +    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
>>> +                       source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
>>> +
>>> +    trace_open_return_path_on_source_continue();
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +__attribute__ (( unused )) /* Until later in patch series */
>>> +/* Returns 0 if the RP was ok, otherwise there was an error on the RP */
>>> +static int await_return_path_close_on_source(MigrationState *ms)
>>> +{
>>> +    /*
>>> +     * If this is a normal exit then the destination will send a SHUT and the
>>> +     * rp_thread will exit, however if there's an error we need to cause
>>> +     * it to exit, which we can do by a shutdown.
>>> +     * (canceling must also shutdown to stop us getting stuck here if
>>> +     * the destination died at just the wrong place)
>>> +     */
>>> +    if (qemu_file_get_error(ms->file) && ms->rp_state.file) {
>>> +        qemu_file_shutdown(ms->rp_state.file);
>>> +    }
>>> +    trace_await_return_path_close_on_source_joining();
>>> +    qemu_thread_join(&ms->rp_state.rp_thread);
>>> +    trace_await_return_path_close_on_source_close();
>>> +    return ms->rp_state.error;
>>> +}
>>> +
>>> +/*
>>> + * Master migration thread on the source VM.
>>> + * It drives the migration and pumps the data down the outgoing channel.
>>> + */
>>>   static void *migration_thread(void *opaque)
>>>   {
>>>       MigrationState *s = opaque;
>>> diff --git a/trace-events b/trace-events
>>> index 5738e3f..282cde1 100644
>>> --- a/trace-events
>>> +++ b/trace-events
>>> @@ -1394,12 +1394,24 @@ flic_no_device_api(int err) "flic: no Device Contral API support %d"
>>>   flic_reset_failed(int err) "flic: reset failed %d"
>>>
>>>   # migration.c
>>> +await_return_path_close_on_source_close(void) ""
>>> +await_return_path_close_on_source_joining(void) ""
>>>   migrate_set_state(int new_state) "new state %d"
>>>   migrate_fd_cleanup(void) ""
>>> +migrate_fd_cleanup_src_rp(void) ""
>>>   migrate_fd_error(void) ""
>>>   migrate_fd_cancel(void) ""
>>>   migrate_pending(uint64_t size, uint64_t max) "pending size %" PRIu64 " max %" PRIu64
>>>   migrate_send_rp_message(int msg_type, uint16_t len) "%d: len %d"
>>> +open_return_path_on_source(void) ""
>>> +open_return_path_on_source_continue(void) ""
>>> +source_return_path_thread_bad_end(void) ""
>>> +source_return_path_thread_end(void) ""
>>> +source_return_path_thread_entry(void) ""
>>> +source_return_path_thread_failed_read_cmd_data(void) ""
>>> +source_return_path_thread_loop_top(void) ""
>>> +source_return_path_thread_pong(uint32_t val) "%x"
>>> +source_return_path_thread_shut(uint32_t val) "%x"
>>>   migrate_transferred(uint64_t tranferred, uint64_t time_spent, double bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %g max_size %" PRId64
>>>
>>>   # migration/rdma.c
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> .
>

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
  2015-07-13 11:02   ` Juan Quintela
  2015-07-20 10:13     ` Amit Shah
@ 2015-08-26 14:48     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-08-26 14:48 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > The state of the postcopy process is managed via a series of messages;
> >    * Add wrappers and handlers for sending/receiving these messages
> >    * Add state variable that track the current state of postcopy
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  include/migration/migration.h |  16 +++
> >  include/sysemu/sysemu.h       |  20 ++++
> >  migration/migration.c         |  13 +++
> >  migration/savevm.c            | 247 ++++++++++++++++++++++++++++++++++++++++++
> >  trace-events                  |  10 ++
> >  5 files changed, 306 insertions(+)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index cd89a9b..34cd9a6 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1128,3 +1128,16 @@ void migrate_fd_connect(MigrationState *s)
> >      qemu_thread_create(&s->thread, "migration", migration_thread, s,
> >                         QEMU_THREAD_JOINABLE);
> >  }
> > +
> > +PostcopyState  postcopy_state_get(MigrationIncomingState *mis)
> > +{
> > +    return atomic_fetch_add(&mis->postcopy_state, 0);
> 
> What is wrong with atomic_read() here?
> As the set of the state is atomic, even a normal read would do (I think)

Actually, I made this an atomic_mb_read as per Paolo's comment on my v5
version (31st March).
I also added a comment documenting which threads read/write the state.

> > +void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
> > +                                           uint16_t len,
> > +                                           uint64_t *start_list,
> > +                                           uint64_t *end_list)
> 
> I haven't looked at the following patches where this function is used,
> but it appears that getting an iovec could be a good idea?

Yes, although I wouldn't want to make the wire format dependent on the
host size_t or pointer size or anything.

> 
> > +{
> > +    uint8_t *buf;
> > +    uint16_t tmplen;
> > +    uint16_t t;
> > +    size_t name_len = strlen(name);
> > +
> > +    trace_qemu_savevm_send_postcopy_ram_discard(name, len);
> > +    buf = g_malloc0(len*16 + name_len + 3);
> 
> I would suggest
>        gmalloc0(1 + 1 + name_len + 1 + (8 + 8) * len)
> 
>        just to be clear where things came from.

Done.

>        I think that we don't need the \0 at all.  If \0 is not there,
>        strlen() return is going to be "funny".  So, we can just change
>        the assert to name_len < 255?

Dave Gibson asked for the \0 in a previous review.

> 
> > +    buf[0] = 0; /* Version */
> > +    assert(name_len < 256);
> 
> Can we move the assert before the malloc()?

Done.

> My guess is that in a perfect world the assert would be a return
> -EINVAL, but I know that it is complicated.
> 
> > +    buf[1] = name_len;
> > +    memcpy(buf+2, name, name_len);
> 
> spaces around '+' (same around)

Done.

> 
> > +    tmplen = 2+name_len;
> > +    buf[tmplen++] = '\0';
> > +
> > +    for (t = 0; t < len; t++) {
> > +        cpu_to_be64w((uint64_t *)(buf + tmplen), start_list[t]);
> > +        tmplen += 8;
> > +        cpu_to_be64w((uint64_t *)(buf + tmplen), end_list[t]);
> > +        tmplen += 8;
>            trace_qemu_savevm_send_postcopy_range(name, start_list[t], end_list[t]);
> 
> ??

???

> > +    /* We're expecting a
> > +     *    Version (0)
> > +     *    a RAM ID string (length byte, name, 0 term)
> > +     *    then at least 1 16 byte chunk
> > +    */
> > +    if (len < 20) { 1 +
> 
>        1+1+1+1+2*8

Done.

> Humm, thinking about it, .... why are we not needing a length field of
> number of entries?

Because we've got the size of the whole message from the command header.

> > +        error_report("CMD_POSTCOPY_RAM_DISCARD invalid length (%d)", len);
> > +        return -1;
> > +    }
> > +
> > +    tmp = qemu_get_byte(mis->file);
> > +    if (tmp != 0) {
> 
> I think that a constant telling POSTCOPY_VERSION0 or whatever?

Done; (as a const postcopy_ram_discard_version)

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard
  2015-07-13 11:47   ` Juan Quintela
@ 2015-09-15 17:01     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-15 17:01 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Where postcopy is preceeded by a period of precopy, the destination will
> > have received pages that may have been dirtied on the source after the
> > page was sent.  The destination must throw these pages away before
> > starting it's CPUs.
> >
> > Maintain a 'sentmap' of pages that have already been sent.
> > Calculate list of sent & dirty pages
> > Provide helpers on the destination side to discard these.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Not a patch without a suggestion O:-)
> 
> > ---
> >  include/migration/migration.h    |  12 +++
> >  include/migration/postcopy-ram.h |  35 +++++++
> >  include/qemu/typedefs.h          |   1 +
> >  migration/migration.c            |   1 +
> >  migration/postcopy-ram.c         | 108 +++++++++++++++++++++
> >  migration/ram.c                  | 203 ++++++++++++++++++++++++++++++++++++++-
> >  migration/savevm.c               |   2 -
> >  trace-events                     |   5 +
> >  8 files changed, 363 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 2a22381..4c6cf95 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -114,6 +114,13 @@ struct MigrationState
> >  
> >      /* Flag set once the migration has been asked to enter postcopy */
> >      bool start_postcopy;
> > +
> > +    /* bitmap of pages that have been sent at least once
> > +     * only maintained and used in postcopy at the moment
> > +     * where it's used to send the dirtymap at the start
> > +     * of the postcopy phase
> > +     */
> > +    unsigned long *sentmap;
> >  };
> 
> We can use this sentmap for zero page optimization.  If page is on
> sentmap, we need to sent a zero page, otherwise, just sent sentmap at
> the end of migration and clean everything not there?

Just as a compact way of sending zero pages? I'm not sure it would help.

> > +/*
> > + * Discard the contents of memory start..end inclusive.
> > + * We can assume that if we've been called postcopy_ram_hosttest returned true
> > + */
> > +int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start,
> > +                               uint8_t *end)
> > +{
> > +    trace_postcopy_ram_discard_range(start, end);
> > +    if (madvise(start, (end-start)+1, MADV_DONTNEED)) {
> 
> Can we s/end/lenght/ and adjust everywhere?

Done - partially; everything that works in bytes is now start & length,
everything that works in indexes into the RAM bitmap is still start/end,
since generally that's what they're working with already.

> Not here, but putting a comment explaining where magic 12 cames from on
> definition of constant?

Done.

> I think that the sentbitmap bits could we used without the rest.

Possibly, I can kind of see it's useful - but I'm not convinced what else for
yet.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue
  2015-07-14  9:40   ` Juan Quintela
@ 2015-09-16 18:36     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-16 18:36 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > When transmitting RAM pages, consume pages that have been queued by
> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> >
> > Note:
> >   a) After a queued page the linear walk carries on from after the
> > unqueued page; there is a reasonable chance that the destination
> > was about to ask for other closeby pages anyway.
> >
> >   b) We have to be careful of any assumptions that the page walking
> > code makes, in particular it does some short cuts on its first linear
> > walk that break as soon as we do a queued page.
> >
> >   c) We have to be careful to not break up host-page size chunks, since
> > this makes it harder to place the pages on the destination.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> 
> > +static bool last_was_from_queue;
> 
> Are we using this variable later in the series?

That was a straggler; I've killed it off.

> >  static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
> >  {
> >      migration_dirty_pages +=
> > @@ -923,6 +933,41 @@ static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block,
> >      return pages;
> >  }
> >  
> > +/*
> > + * Unqueue a page from the queue fed by postcopy page requests
> > + *
> > + * Returns:      The RAMBlock* to transmit from (or NULL if the queue is empty)
> > + *      ms:      MigrationState in
> > + *  offset:      the byte offset within the RAMBlock for the start of the page
> > + * ram_addr_abs: global offset in the dirty/sent bitmaps
> > + */
> > +static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t *offset,
> > +                                       ram_addr_t *ram_addr_abs)
> > +{
> > +    RAMBlock *result = NULL;
> > +    qemu_mutex_lock(&ms->src_page_req_mutex);
> > +    if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
> > +        struct MigrationSrcPageRequest *entry =
> > +                                    QSIMPLEQ_FIRST(&ms->src_page_requests);
> > +        result = entry->rb;
> > +        *offset = entry->offset;
> > +        *ram_addr_abs = (entry->offset + entry->rb->offset) & TARGET_PAGE_MASK;
> > +
> > +        if (entry->len > TARGET_PAGE_SIZE) {
> > +            entry->len -= TARGET_PAGE_SIZE;
> > +            entry->offset += TARGET_PAGE_SIZE;
> > +        } else {
> > +            memory_region_unref(result->mr);
> 
> Here it is the unref, but I still don't understand why we don't need to
> undo that on the error case on previous patch.

I've added an unref to the 'flush_page_queue' routine that's
called during cleanup; thus we take a ref whenever anything is added to
the queue, and release it either when we remove it to use it, or during
cleanup.

> > +            QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> > +            g_free(entry);
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&ms->src_page_req_mutex);
> > +
> > +    return result;
> > +}
> > +
> > +
> >  /**
> >   * Queue the pages for transmission, e.g. a request from postcopy destination
> >   *   ms: MigrationStatus in which the queue is held
> > @@ -987,6 +1032,58 @@ err:
> >  
> 
> > @@ -997,65 +1094,102 @@ err:
> >   * @f: QEMUFile where to send the data
> >   * @last_stage: if we are at the completion stage
> >   * @bytes_transferred: increase it with the number of transferred bytes
> > + *
> > + * On systems where host-page-size > target-page-size it will send all the
> > + * pages in a host page that are dirty.
> >   */
> >  
> >  static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
> >                                     uint64_t *bytes_transferred)
> >  {
> > +    MigrationState *ms = migrate_get_current();
> >      RAMBlock *block = last_seen_block;
> > +    RAMBlock *tmpblock;
> >      ram_addr_t offset = last_offset;
> > +    ram_addr_t tmpoffset;
> >      bool complete_round = false;
> >      int pages = 0;
> > -    MemoryRegion *mr;
> >      ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
> >                                   ram_addr_t space */
> >  
> > -    if (!block)
> > +    if (!block) {
> >          block = QLIST_FIRST_RCU(&ram_list.blocks);
> > +        last_was_from_queue = false;
> > +    }
> >  
> > -    while (true) {
> > -        mr = block->mr;
> > -        offset = migration_bitmap_find_and_reset_dirty(mr, offset,
> > -                                                       &dirty_ram_abs);
> > -        if (complete_round && block == last_seen_block &&
> > -            offset >= last_offset) {
> > -            break;
> > -        }
> > -        if (offset >= block->used_length) {
> > -            offset = 0;
> > -            block = QLIST_NEXT_RCU(block, next);
> > -            if (!block) {
> > -                block = QLIST_FIRST_RCU(&ram_list.blocks);
> > -                complete_round = true;
> > -                ram_bulk_stage = false;
> > -                if (migrate_use_xbzrle()) {
> > -                    /* If xbzrle is on, stop using the data compression at this
> > -                     * point. In theory, xbzrle can do better than compression.
> > -                     */
> > -                    flush_compressed_data(f);
> > -                    compression_switch = false;
> > -                }
> > +    while (true) { /* Until we send a block or run out of stuff to send */
> > +        tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &dirty_ram_abs);
> 
> This function was ugly.  You already split it in the past.  This patch
> makes it even more complicated.  Can we try something like add a
> 
> ram_find_next_page() and try to put some of the code inside the while
> there?

I've just posted a pair of patches separately that do this; please let
me know if they're on the right lines; they can be applied without postcopy.

> Once here, can we agree to send the next N pages (if they are contiguos)
> if we receive a queued request?  Yeap, deciding N means testing and measuring.
> And can wait for this to be integrated.

Yes we could do that; at the moment I'm working in host page sized chunks.

> > +
> > +        if (tmpblock) {
> > +            /* We've got a block from the postcopy queue */
> > +            trace_ram_find_and_save_block_postcopy(tmpblock->idstr,
> > +                                                   (uint64_t)tmpoffset,
> > +                                                   (uint64_t)dirty_ram_abs);
> > +            /*
> > +             * We're sending this page, and since it's postcopy nothing else
> > +             * will dirty it, and we must make sure it doesn't get sent again
> > +             * even if this queue request was received after the background
> > +             * search already sent it.
> > +             */
> > +            if (!test_bit(dirty_ram_abs >> TARGET_PAGE_BITS,
> > +                          migration_bitmap)) {
> 
> I think this test can be inside ram_save_unqueue_page()
> 
> I.e. rename to:
> 
> ram_save_get_next_queued_page()

Renamed to the shorter get_queued_page (it's static anyway).

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue
  2015-07-27  6:05   ` Amit Shah
@ 2015-09-16 18:48     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-16 18:48 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:45], Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > When transmitting RAM pages, consume pages that have been queued by
> > MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> 
> It's slightly confusing with 'consume': we're /servicing/ requests from
> the dest at the src here rather than /consuming/ pages sent by src at
> the dest.  If you find 'service' better than 'consume', please update
> the commit msg+log.

'consume' is a fairly normal term for taking an item off a queue and
processing it.

> > Note:
> >   a) After a queued page the linear walk carries on from after the
> > unqueued page; there is a reasonable chance that the destination
> > was about to ask for other closeby pages anyway.
> > 
> >   b) We have to be careful of any assumptions that the page walking
> > code makes, in particular it does some short cuts on its first linear
> > walk that break as soon as we do a queued page.
> > 
> >   c) We have to be careful to not break up host-page size chunks, since
> > this makes it harder to place the pages on the destination.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> > +static int ram_save_host_page(MigrationState *ms, QEMUFile *f, RAMBlock* block,
> > +                              ram_addr_t *offset, bool last_stage,
> > +                              uint64_t *bytes_transferred,
> > +                              ram_addr_t dirty_ram_abs)
> > +{
> > +    int tmppages, pages = 0;
> > +    do {
> > +        /* Check the pages is dirty and if it is send it */
> > +        if (migration_bitmap_clear_dirty(dirty_ram_abs)) {
> > +            if (compression_switch && migrate_use_compression()) {
> > +                tmppages = ram_save_compressed_page(f, block, *offset,
> > +                                                    last_stage,
> > +                                                    bytes_transferred);
> > +            } else {
> > +                tmppages = ram_save_page(f, block, *offset, last_stage,
> > +                                         bytes_transferred);
> > +            }
> 
> Something for the future: we should just have ram_save_page which does
> compression (or not); and even encryption (or not), and so on.

Yep, in my current world that's now a 'ram_save_host_page' function
that has that buried in it.

> > +
> > +            if (tmppages < 0) {
> > +                return tmppages;
> > +            } else {
> > +                if (ms->sentmap) {
> > +                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> > +                }
> > +            }
> 
> This else could be dropped as the if stmt returns.

Done

> > +            pages += tmppages;
> > +        }
> > +        *offset += TARGET_PAGE_SIZE;
> > +        dirty_ram_abs += TARGET_PAGE_SIZE;
> > +    } while (*offset & (qemu_host_page_size - 1));
> > +
> > +    /* The offset we leave with is the last one we looked at */
> > +    *offset -= TARGET_PAGE_SIZE;
> > +    return pages;
> > +}
> > +
> > +/**
> >   * ram_find_and_save_block: Finds a dirty page and sends it to f
> >   *
> >   * Called within an RCU critical section.
> > @@ -997,65 +1094,102 @@ err:
> >   * @f: QEMUFile where to send the data
> >   * @last_stage: if we are at the completion stage
> >   * @bytes_transferred: increase it with the number of transferred bytes
> > + *
> > + * On systems where host-page-size > target-page-size it will send all the
> > + * pages in a host page that are dirty.
> >   */
> >  
> >  static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
> >                                     uint64_t *bytes_transferred)
> >  {
> > +    MigrationState *ms = migrate_get_current();
> >      RAMBlock *block = last_seen_block;
> > +    RAMBlock *tmpblock;
> >      ram_addr_t offset = last_offset;
> > +    ram_addr_t tmpoffset;
> >      bool complete_round = false;
> >      int pages = 0;
> > -    MemoryRegion *mr;
> >      ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
> >                                   ram_addr_t space */
> >  
> > -    if (!block)
> > +    if (!block) {
> >          block = QLIST_FIRST_RCU(&ram_list.blocks);
> > +        last_was_from_queue = false;
> > +    }
> >  
> > -    while (true) {
> > -        mr = block->mr;
> > -        offset = migration_bitmap_find_and_reset_dirty(mr, offset,
> > -                                                       &dirty_ram_abs);
> > -        if (complete_round && block == last_seen_block &&
> > -            offset >= last_offset) {
> > -            break;
> > -        }
> > -        if (offset >= block->used_length) {
> > -            offset = 0;
> > -            block = QLIST_NEXT_RCU(block, next);
> > -            if (!block) {
> > -                block = QLIST_FIRST_RCU(&ram_list.blocks);
> > -                complete_round = true;
> > -                ram_bulk_stage = false;
> > -                if (migrate_use_xbzrle()) {
> > -                    /* If xbzrle is on, stop using the data compression at this
> > -                     * point. In theory, xbzrle can do better than compression.
> > -                     */
> > -                    flush_compressed_data(f);
> > -                    compression_switch = false;
> > -                }
> > +    while (true) { /* Until we send a block or run out of stuff to send */
> > +        tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &dirty_ram_abs);
> > +
> > +        if (tmpblock) {
> > +            /* We've got a block from the postcopy queue */
> > +            trace_ram_find_and_save_block_postcopy(tmpblock->idstr,
> > +                                                   (uint64_t)tmpoffset,
> > +                                                   (uint64_t)dirty_ram_abs);
> > +            /*
> > +             * We're sending this page, and since it's postcopy nothing else
> > +             * will dirty it, and we must make sure it doesn't get sent again
> > +             * even if this queue request was received after the background
> > +             * search already sent it.
> > +             */
> > +            if (!test_bit(dirty_ram_abs >> TARGET_PAGE_BITS,
> > +                          migration_bitmap)) {
> > +                trace_ram_find_and_save_block_postcopy_not_dirty(
> > +                    tmpblock->idstr, (uint64_t)tmpoffset,
> > +                    (uint64_t)dirty_ram_abs,
> > +                    test_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap));
> > +
> > +                continue;
> >              }
> > +            /*
> > +             * As soon as we start servicing pages out of order, then we have
> > +             * to kill the bulk stage, since the bulk stage assumes
> > +             * in (migration_bitmap_find_and_reset_dirty) that every page is
> > +             * dirty, that's no longer true.
> > +             */
> > +            ram_bulk_stage = false;
> > +            /*
> > +             * We want the background search to continue from the queued page
> > +             * since the guest is likely to want other pages near to the page
> > +             * it just requested.
> > +             */
> > +            block = tmpblock;
> > +            offset = tmpoffset;
> >          } else {
> > -            if (compression_switch && migrate_use_compression()) {
> > -                pages = ram_save_compressed_page(f, block, offset, last_stage,
> > -                                                 bytes_transferred);
> > -            } else {
> > -                pages = ram_save_page(f, block, offset, last_stage,
> > -                                      bytes_transferred);
> > +            MemoryRegion *mr;
> > +            /* priority queue empty, so just search for something dirty */
> > +            mr = block->mr;
> > +            offset = migration_bitmap_find_dirty(mr, offset, &dirty_ram_abs);
> > +            if (complete_round && block == last_seen_block &&
> > +                offset >= last_offset) {
> > +                break;
> >              }
> > -
> > -            /* if page is unmodified, continue to the next */
> > -            if (pages > 0) {
> > -                MigrationState *ms = migrate_get_current();
> > -                if (ms->sentmap) {
> > -                    set_bit(dirty_ram_abs >> TARGET_PAGE_BITS, ms->sentmap);
> > +            if (offset >= block->used_length) {
> > +                offset = 0;
> > +                block = QLIST_NEXT_RCU(block, next);
> > +                if (!block) {
> > +                    block = QLIST_FIRST_RCU(&ram_list.blocks);
> > +                    complete_round = true;
> > +                    ram_bulk_stage = false;
> > +                    if (migrate_use_xbzrle()) {
> > +                        /* If xbzrle is on, stop using the data compression at
> > +                         * this point. In theory, xbzrle can do better than
> > +                         * compression.
> > +                         */
> > +                        flush_compressed_data(f);
> > +                        compression_switch = false;
> > +                    }
> >                  }
> > -
> > -                last_sent_block = block;
> > -                break;
> > +                continue; /* pick an offset in the new block */
> >              }
> >          }
> > +
> > +        pages = ram_save_host_page(ms, f, block, &offset, last_stage,
> > +                                   bytes_transferred, dirty_ram_abs);
> > +
> > +        /* if page is unmodified, continue to the next */
> > +        if (pages > 0) {
> > +            break;
> > +        }
> 
> This function could use splitting into multiple ones.

Done, a separate pair of patches is on list to do that split; please review.

Dave

> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers
  2015-07-14 10:05   ` Juan Quintela
  2015-07-27  6:11     ` Amit Shah
@ 2015-09-23 16:45     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-23 16:45 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > postcopy_place_page (etc) provide a way for postcopy to place a page
> > into guests memory atomically (using the copy ioctl on the ufd).
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> > --- a/include/migration/postcopy-ram.h
> > +++ b/include/migration/postcopy-ram.h
> > @@ -69,4 +69,20 @@ void postcopy_discard_send_range(MigrationState *ms, PostcopyDiscardState *pds,
> >  void postcopy_discard_send_finish(MigrationState *ms,
> >                                    PostcopyDiscardState *pds);
> >  
> > +/*
> > + * Place a page (from) at (host) efficiently
> > + *    There are restrictions on how 'from' must be mapped, in general best
> > + *    to use other postcopy_ routines to allocate.
> > + * returns 0 on success
> > + */
> > +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> > +                        bool all_zero);
> > +
> > +/*
> > + * Allocate a page of memory that can be mapped at a later point in time
> > + * using postcopy_place_page
> > + * Returns: Pointer to allocated page
> > + */
> > +void *postcopy_get_tmp_page(MigrationIncomingState *mis);
> > +
> 
> I don't think that this makes sense, but wouldn't have been a good idea
> to ask for the address that we want as a hint.  That could help with
> fragmentation, no?

I think that we may be able to do something if we were to transmit huge
pages (which is a separate problem); but at the moment all get_tmp_page
does it an mmap with the right set of flags, and that mmap only happens
once for all the pages; it's only the backing page that gets moved,
that mmap is reused for the whole run.

> > +/*
> > + * Place a host page (from) at (host) atomically
> > + * all_zero: Hint that the page being placed is 0 throughout
> > + * returns 0 on success
> > + */
> > +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> > +                        bool all_zero)
> 
> postcop_place_page() and postcop_place_zero_page()?  They just share a
> trace point :p

Done.

> > +{
> > +    if (!all_zero) {
> > +        struct uffdio_copy copy_struct;
> > +
> > +        copy_struct.dst = (uint64_t)(uintptr_t)host;
> > +        copy_struct.src = (uint64_t)(uintptr_t)from;
> > +        copy_struct.len = getpagesize();
> > +        copy_struct.mode = 0;
> > +
> > +        /* copy also acks to the kernel waking the stalled thread up
> > +         * TODO: We can inhibit that ack and only do it if it was requested
> > +         * which would be slightly cheaper, but we'd have to be careful
> > +         * of the order of updating our page state.
> > +         */
> > +        if (ioctl(mis->userfault_fd, UFFDIO_COPY, &copy_struct)) {
> > +            int e = errno;
> > +            error_report("%s: %s copy host: %p from: %p",
> > +                         __func__, strerror(e), host, from);
> > +
> > +            return -e;
> > +        }
> > +    } else {
> > +        struct uffdio_zeropage zero_struct;
> > +
> > +        zero_struct.range.start = (uint64_t)(uintptr_t)host;
> > +        zero_struct.range.len = getpagesize();
> > +        zero_struct.mode = 0;
> > +
> > +        if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
> > +            int e = errno;
> > +            error_report("%s: %s zero host: %p from: %p",
> > +                         __func__, strerror(e), host, from);
> > +
> > +            return -e;
> > +        }
> > +    }
> > +
> > +    trace_postcopy_place_page(host, all_zero);
> > +    return 0;
> > +}
> 
> I really think that the userfault code should be in a linux specific
> file, but that can be done late, so I will not insist O:-)

I think it will make sense once we have another OSs view of what the interface
should look like, and then we can get an abstraction that works for both and
move the implementations out.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
  2015-07-21 10:33   ` Amit Shah
@ 2015-09-23 17:04     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-23 17:04 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Tue) 16 Jun 2015 [11:26:36], Dr. David Alan Gilbert (git) wrote:
> 
> > -    if (s->state == MIGRATION_STATUS_ACTIVE ||
> > -        s->state == MIGRATION_STATUS_SETUP) {
> > +    if (migration_already_active(s)) {
> 
> (I know, not introduced here, but:)
> 
> A better name is migration_is_active()

Done.

> 
> > +bool migration_postcopy_phase(MigrationState *s)
> > +{
> > +    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > +}
> 
> And this is better named migration_in_postcopy()

Done

> 
> otherwise,
> 
> Reviewed-by: Amit Shah <amit.shah@redhat.com>
> 
> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread
  2015-07-13 18:09       ` Juan Quintela
@ 2015-09-23 17:56         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-23 17:56 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> >> > +     * We need to leave the fd free for page transfers during the
> >> > +     * loading of the device state, so wrap all the remaining
> >> > +     * commands and state into a package that gets sent in one go
> >> > +     */
> >> > +    QEMUFile *fb = qemu_bufopen("w", NULL);
> >> > +    if (!fb) {
> >> > +        error_report("Failed to create buffered file");
> >> > +        goto fail;
> >> > +    }
> >> > +
> >> > +    qemu_savevm_state_complete_precopy(fb);
> >> > +    qemu_savevm_send_ping(fb, 3);
> >> > +
> >> > +    qemu_savevm_send_postcopy_run(fb);
> >> > +
> >> > +    /* <><> end of stuff going into the package */
> >> > +    qsb = qemu_buf_get(fb);
> >> > +
> >> > +    /* Now send that blob */
> >> > +    if (qemu_savevm_send_packaged(ms->file, qsb)) {
> >> > +        goto fail_closefb;
> >> > +    }
> >> > +    qemu_fclose(fb);
> >> 
> >> Why can't we send this directly without the extra copy?
> >> I guess that there are some missing/extra section starts/end whatever?
> >> Anything specific?
> >
> > The problem is that the destination has to be able to read the chunk
> > of migration stream off the fd and leave the fd free for page requests
> > that may be required during loading the device state.
> > Since the migration-stream is unstructured, there is no way to read
> > a chunk of stream off without knowing the length of that chunk, and the
> > only way to know that chunk is to write it to a buffer and then see
> > how big it is.
> 
> Arghhh.  ok.  Comment?

I've changed the comment at the start of that section to:

     * While loading the device state we may trigger page transfer
     * requests and the fd must be free to process those, and thus
     * the destination must read the whole device state off the fd before
     * it starts processing it.  Unfortunately the ad-hoc migration format
     * doesn't allow the destination to know the size to read without fully
     * parsing it through each devices load-state code (especially the open
     * coded devices that use get/put).
     * So we wrap the device state up in a package with a length at the start;
     * to do this we use a qemu_buf to hold the whole of the device state.

Dave

> 
> >
> >> > +    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
> >> 
> >> Now, that we are here, is there a counter of the time that takes the
> >> postcopy stage?  Just curious.
> >
> > No, not separate.
> >
> >> > +/*
> >> >   * Master migration thread on the source VM.
> >> >   * It drives the migration and pumps the data down the outgoing channel.
> >> >   */
> >> >  static void *migration_thread(void *opaque)
> >> >  {
> >> >      MigrationState *s = opaque;
> >> > +    /* Used by the bandwidth calcs, updated later */
> >> >      int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >> >      int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >> >      int64_t initial_bytes = 0;
> >> >      int64_t max_size = 0;
> >> >      int64_t start_time = initial_time;
> >> >      bool old_vm_running = false;
> >> > +    bool entered_postcopy = false;
> >> > +    /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
> >> > +    enum MigrationStatus current_active_type = MIGRATION_STATUS_ACTIVE;
> >> 
> >> current_active_state?
> >
> > Changed.
> >
> > Dave
> >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation
  2015-07-13 12:04   ` Juan Quintela
@ 2015-09-23 19:06     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-23 19:06 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/migration/migration.h    |   3 +
> >  include/migration/postcopy-ram.h |  12 ++++
> >  migration/postcopy-ram.c         | 116 +++++++++++++++++++++++++++++++++++++++
> >  migration/ram.c                  |  11 ++++
> >  migration/savevm.c               |   4 ++
> >  trace-events                     |   2 +
> >  6 files changed, 148 insertions(+)
> >
> 
> qemu_hugepage_enable(host_addr, length)?
> 
> > +#ifdef MADV_NOHUGEPAGE
> > +    if (madvise(host_addr, length, MADV_NOHUGEPAGE)) {
> > +        error_report("%s: NOHUGEPAGE: %s", __func__, strerror(errno));
> > +        return -1;
> > +    }
> > +#endif
> 
> qemu_hugepage_disable(host_addr, length)?

I've flipped those both to use the qemu_madvise which is how
it's done in most other places in the codebase (I added
the QEMU_MADV_NOHUGEPAGE since HUGEPAGE was there but not the opposite).

> > +#ifdef MADV_HUGEPAGE
> > +    if (madvise(host_addr, length, MADV_HUGEPAGE)) {
> > +        error_report("%s HUGEPAGE: %s", __func__, strerror(errno));
> > +        return -1;
> > +    }
> > +#endif
> > +
> > +    /*
> > +     * We can also turn off userfault now since we should have all the
> > +     * pages.   It can be useful to leave it on to debug postcopy
> > +     * if you're not sure it's always getting every page.
> > +     */
> 
> qemu_userfault_unregister(host_addr, length)?

Is it worth wrapping that ioctl, when it's already in a function
that has to do other calls around it?

> > +    range_struct.start = (uintptr_t)host_addr;
> > +    range_struct.len = length;
> > +
> > +    if (ioctl(mis->userfault_fd, UFFDIO_UNREGISTER, &range_struct)) {
> > +        error_report("%s: userfault unregister %s", __func__, strerror(errno));
> > +
> > +        return -1;
> > +    }
> 
> >  
> > +/*
> > + * Allocate data structures etc needed by incoming migration with postcopy-ram
> > + * postcopy-ram's similarly names postcopy_ram_incoming_init does the work
> > + */
> > +int ram_postcopy_incoming_init(MigrationIncomingState *mis)
> > +{
> > +    size_t ram_pages = last_ram_offset() >> TARGET_PAGE_BITS;
> > +
> > +    return postcopy_ram_incoming_init(mis, ram_pages);
> > +}
> > +
> 
> ram_postocpy_incoming_init()
> and
> postcopy_ram_incoming_init()
> 
> ouch  Thinking about better names ....

Agreed; suggestions welcome. the 'ram' and 'postcopy_ram' are
both the convention based on the filename.

> >  static int ram_load(QEMUFile *f, void *opaque, int version_id)
> >  {
> >      int flags = 0, ret = 0;
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index e6398dd..f4de52d 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1238,6 +1238,10 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
> >          return -1;
> >      }
> >  
> > +    if (ram_postcopy_incoming_init(mis)) {
> > +        return -1;
> > +    }
> > +
> 
> how/where we know that this is called soon enough?

It's driven by the sequence of commands byte that walk it through
the state machine.

> >      postcopy_state_set(mis, POSTCOPY_INCOMING_ADVISE);
> >  
> >      return 0;
> > diff --git a/trace-events b/trace-events
> > index 5e8a120..2ffc1c6 100644
> > --- a/trace-events
> > +++ b/trace-events
> > @@ -1498,7 +1498,9 @@ rdma_start_outgoing_migration_after_rdma_source_init(void) ""
> >  
> >  # migration/postcopy-ram.c
> >  postcopy_discard_send_finish(const char *ramblock, int nwords, int ncmds) "%s mask words sent=%d in %d commands"
> > +postcopy_cleanup_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
> >  postcopy_ram_discard_range(void *start, void *end) "%p,%p"
> > +postcopy_init_area(const char *ramblock, void *host_addr, size_t offset, size_t length) "%s: %p offset=%zx length=%zx"
> 
> once here, if we have range names before, what about:
> 
> postcopy_ram_cleanup_range()
> postcopy_ram_init_range()

Done.

> And let the ram* functions the same?

Not sure which ones those refer to?

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-07-21  7:40         ` Amit Shah
@ 2015-09-24  9:59           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-24  9:59 UTC (permalink / raw)
  To: Amit Shah
  Cc: aarcange, yamahata, Juan Quintela, liang.z.li, qemu-devel, luis,
	pbonzini, david

* Amit Shah (amit.shah@redhat.com) wrote:
> On (Mon) 13 Jul 2015 [20:07:52], Juan Quintela wrote:
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > * Juan Quintela (quintela@redhat.com) wrote:
> 
> > >> > +void qmp_migrate_start_postcopy(Error **errp)
> > >> > +{
> > >> > +    MigrationState *s = migrate_get_current();
> > >> > +
> > >> > +    if (!migrate_postcopy_ram()) {
> > >> > +        error_setg(errp, "Enable postcopy with migration_set_capability before"
> > >> > +                         " the start of migration");
> > >> > +        return;
> > >> > +    }
> > >> > +
> > >> > +    if (s->state == MIGRATION_STATUS_NONE) {
> > >> 
> > >> I would claim that this check should be:
> > >> 
> > >>     if (s->state != MIGRATION_STATUS_ACTIVE) {
> > >> ??
> > >> 
> > >> FAILED, COMPLETED, CANCELL* don't make sense, right?
> > >
> > > What I'm trying to catch here is people doing:
> > >      migrate_start_postcopy
> > >      migrate tcp:pppp:whereever
> > >
> > >   which wont work, because migrate_init reinitialises
> > > the flag that start previously set.
> > >
> > > However, I also don't want to create a race, since what you do is
> > > typically:
> > >      migrate  tcp:pppp:whereever
> > >    <wait some time, get bored>
> > >      migrate_start_postcopy
> > >
> > > if you're unlucky, and the migration finishes just
> > > at the same time you do the migrate_start_postcopy, do you
> > > want migrate_start_postcopy to fail?  My guess was it
> > > was best for it not to fail, in this case.
> > 
> > Change the order, if it is ACTIVE: do the postcopy thing, otherwise, do
> > the clause that is protected now?  Moving to postcopy only make sense if
> > we are in active.
> 
> Yeah, I tend to agree, because in the cases where migration has failed
> or has been cancelled, we'll end up setting the postcopy bit.  Then,
> upon the next migration, this bit could get reused - resulting in the
> previous condition of setting postcopy bit before starting migration.

No, that doesn't happen;  the bit is cleared at the start of migration
so that race condition doesn't exist.

Dave

> 
> 
> 		Amit
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy
  2015-07-14 15:22   ` Juan Quintela
  2015-07-28  6:02     ` Amit Shah
@ 2015-09-24 10:36     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-24 10:36 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Userfault doesn't work with mlock; mlock is designed to nail down pages
> > so they don't move, userfault is designed to tell you when they're not
> > there.
> >
> > munlock the pages we userfault protect before postcopy.
> > mlock everything again at the end if mlock is enabled.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  include/sysemu/sysemu.h  |  1 +
> >  migration/postcopy-ram.c | 24 ++++++++++++++++++++++++
> >  2 files changed, 25 insertions(+)
> >
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 1af2ea0..c1f3da4 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -171,6 +171,7 @@ extern int boot_menu;
> >  extern bool boot_strict;
> >  extern uint8_t *boot_splash_filedata;
> >  extern size_t boot_splash_filedata_size;
> > +extern bool enable_mlock;
> >  extern uint8_t qemu_extra_params_fw[2];
> >  extern QEMUClockType rtc_clock;
> >  extern const char *mem_path;
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 7eb1fb9..be7e5f2 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -85,6 +85,11 @@ static bool ufd_version_check(int ufd)
> >      return true;
> >  }
> >  
> > +/*
> > + * Note: This has the side effect of munlock'ing all of RAM, that's
> > + * normally fine since if the postcopy succeeds it gets turned back on at the
> > + * end.
> > + */
> >  bool postcopy_ram_supported_by_host(void)
> >  {
> >      long pagesize = getpagesize();
> > @@ -113,6 +118,15 @@ bool postcopy_ram_supported_by_host(void)
> >      }
> >  
> >      /*
> > +     * userfault and mlock don't go together; we'll put it back later if
> > +     * it was enabled.
> > +     */
> > +    if (munlockall()) {
> > +        error_report("%s: munlockall: %s", __func__,  strerror(errno));
> 
> 
> why is this not proteced by enable_mlock?

Because there's no harm in doing the 'unlock', and if something else somewhere
other than the enable_mlock had enabled it then postcopy would break.

> > +        return -1;
> > +    }
> > +
> > +    /*
> >       *  We need to check that the ops we need are supported on anon memory
> >       *  To do that we need to register a chunk and see the flags that
> >       *  are returned.
> > @@ -303,6 +317,16 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> >          mis->have_fault_thread = false;
> >      }
> >  
> > +    if (enable_mlock) {
> > +        if (os_mlock() < 0) {
> > +            error_report("mlock: %s", strerror(errno));
> > +            /*
> > +             * It doesn't feel right to fail at this point, we have a valid
> > +             * VM state.
> > +             */
> 
> realtime_init() exit in case of os_mlock() fails, so current code is:
> 
> - we start qemu with mlock requset
> - we mlock memory
> - we start postcopy
> - we munlock memory
> - we mlock memory
> 
> I wmill really, really preffer having a check if memory is mlocked, and
> it that case, just abort migration altogether.

Although it does look like users want the two together.

>  Or better still, wait to
> enable mlock *until* we have finished postcopy, no?

I think that is likely to:
   a) Produce a longer downtime
    Lets follow your summary points above:

    - we start qemu with mlock requset
    - we mlock memory
      !! This can be expensive - we might have to do some swap and the kernel
         has to make sure it has this available.  But at the end we have enough
         memory.    Anyway, this isn't on any critical path; the source is still
         running.
    - we start postcopy
    - we munlock memory
      !! OK, that's not good, but....
    - we mlock memory
      !! There's a pretty good chance that most of the memory we force allocated
         during the 1st mlock is still available, so this should be faster than the
         1st mlock.

    If we flip it so we do just the mlock at the end of postcopy, it's got a much
    higher chance of needing to swap stuff in.


   b) the main mlock for realtime happens very early in startup in vl.c;
     and that's way before the destination knows it's about to have
     a postcopy migration incoming.

Dave
> 
      
> 
> Later, Juan.
> 
> > +        }
> > +    }
> > +
> >      postcopy_state_set(mis, POSTCOPY_INCOMING_END);
> >      migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy
  2015-07-13 18:07       ` Juan Quintela
  2015-07-21  7:40         ` Amit Shah
@ 2015-09-24 14:20         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 209+ messages in thread
From: Dr. David Alan Gilbert @ 2015-09-24 14:20 UTC (permalink / raw)
  To: Juan Quintela
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > * Juan Quintela (quintela@redhat.com) wrote:
> >> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> >> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> >
> >> > Once postcopy is enabled (with migrate_set_capability), the migration
> >> > will still start on precopy mode.  To cause a transition into postcopy
> >> > the:
> >> >
> >> >   migrate_start_postcopy
> >> >
> >> > command must be issued.  Postcopy will start sometime after this
> >> > (when it's next checked in the migration loop).
> >> >
> >> > Issuing the command before migration has started will error,
> >> > and issuing after it has finished is ignored.
> >> >
> >> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> > Reviewed-by: Eric Blake <eblake@redhat.com>
> >> 
> >> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> >> > index a5951ac..e973490 100644
> >> > --- a/include/migration/migration.h
> >> > +++ b/include/migration/migration.h
> >> > @@ -111,6 +111,9 @@ struct MigrationState
> >> >      int64_t xbzrle_cache_size;
> >> >      int64_t setup_time;
> >> >      int64_t dirty_sync_count;
> >> > +
> >> > +    /* Flag set once the migration has been asked to enter postcopy */
> >> > +    bool start_postcopy;
> >> >  };
> >> >  
> >> >  void process_incoming_migration(QEMUFile *f);
> >> > diff --git a/migration/migration.c b/migration/migration.c
> >> > index e77b8b4..6fc47f9 100644
> >> > --- a/migration/migration.c
> >> > +++ b/migration/migration.c
> >> > @@ -465,6 +465,28 @@ void qmp_migrate_set_parameters(bool has_compress_level,
> >> >      }
> >> >  }
> >> >  
> >> > +void qmp_migrate_start_postcopy(Error **errp)
> >> > +{
> >> > +    MigrationState *s = migrate_get_current();
> >> > +
> >> > +    if (!migrate_postcopy_ram()) {
> >> > +        error_setg(errp, "Enable postcopy with migration_set_capability before"
> >> > +                         " the start of migration");
> >> > +        return;
> >> > +    }
> >> > +
> >> > +    if (s->state == MIGRATION_STATUS_NONE) {
> >> 
> >> I would claim that this check should be:
> >> 
> >>     if (s->state != MIGRATION_STATUS_ACTIVE) {
> >> ??
> >> 
> >> FAILED, COMPLETED, CANCELL* don't make sense, right?
> >
> > What I'm trying to catch here is people doing:
> >      migrate_start_postcopy
> >      migrate tcp:pppp:whereever
> >
> >   which wont work, because migrate_init reinitialises
> > the flag that start previously set.
> >
> > However, I also don't want to create a race, since what you do is
> > typically:
> >      migrate  tcp:pppp:whereever
> >    <wait some time, get bored>
> >      migrate_start_postcopy
> >
> > if you're unlucky, and the migration finishes just
> > at the same time you do the migrate_start_postcopy, do you
> > want migrate_start_postcopy to fail?  My guess was it
> > was best for it not to fail, in this case.
> 
> Change the order, if it is ACTIVE: do the postcopy thing, otherwise, do
> the clause that is protected now?  Moving to postcopy only make sense if
> we are in active.

The problem is that produces a race-condition for the command.
If you wait too long and the migration finishes before you issue the command
you get an error, when the migration has completed perfectly happily.

Dave

> 
> Later, Juan.
> 
> 
> >
> > Dave
> >
> >> 
> >> Thanks, Juan.
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 209+ messages in thread

* Re: [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request
  2015-08-06 10:45     ` Dr. David Alan Gilbert
@ 2015-10-20 10:29       ` Juan Quintela
  0 siblings, 0 replies; 209+ messages in thread
From: Juan Quintela @ 2015-10-20 10:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: aarcange, yamahata, liang.z.li, qemu-devel, luis, amit.shah,
	pbonzini, david

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Juan Quintela (quintela@redhat.com) wrote:

>> Once here, do we care about calling malloc with the rcu set?  or could
>> we just call malloc at the beggining of the function and free it in case
>> that it is not needed on err?
>
> Why would that be better?

We would make the rcu region smaller, not sure we should care, so
forget.

Later, Juan.

^ permalink raw reply	[flat|nested] 209+ messages in thread

end of thread, other threads:[~2015-10-20 10:29 UTC | newest]

Thread overview: 209+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-16 10:26 [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Dr. David Alan Gilbert (git)
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 01/42] Start documenting how postcopy works Dr. David Alan Gilbert (git)
2015-06-17 11:42   ` Juan Quintela
2015-06-17 12:30     ` Dr. David Alan Gilbert
2015-06-18  7:50   ` Li, Liang Z
2015-06-18  8:10     ` Dr. David Alan Gilbert
2015-06-18  8:28     ` Paolo Bonzini
2015-06-19 17:52       ` Dr. David Alan Gilbert
2015-06-26  6:46   ` Yang Hongyang
2015-06-26  7:53     ` zhanghailiang
2015-06-26  8:00       ` Yang Hongyang
2015-06-26  8:10         ` Dr. David Alan Gilbert
2015-06-26  8:19           ` Yang Hongyang
2015-08-04  5:20   ` Amit Shah
2015-08-05 12:21     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 02/42] Provide runtime Target page information Dr. David Alan Gilbert (git)
2015-06-17 11:43   ` Juan Quintela
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 03/42] Init page sizes in qtest Dr. David Alan Gilbert (git)
2015-06-17 11:49   ` Juan Quintela
2015-07-06  6:14   ` Amit Shah
2015-08-04  5:23   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 04/42] qemu_ram_block_from_host Dr. David Alan Gilbert (git)
2015-06-17 11:54   ` Juan Quintela
2015-07-10  8:36   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 05/42] Add qemu_get_buffer_less_copy to avoid copies some of the time Dr. David Alan Gilbert (git)
2015-06-17 11:57   ` Juan Quintela
2015-06-17 12:33     ` Dr. David Alan Gilbert
2015-07-13  9:08   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 06/42] Add wrapper for setting blocking status on a QEMUFile Dr. David Alan Gilbert (git)
2015-06-17 11:59   ` Juan Quintela
2015-06-17 12:34     ` Dr. David Alan Gilbert
2015-06-17 12:57       ` Juan Quintela
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 07/42] ram_debug_dump_bitmap: Dump a migration bitmap as text Dr. David Alan Gilbert (git)
2015-06-17 12:17   ` Juan Quintela
2015-06-19 17:04     ` Dr. David Alan Gilbert
2015-07-13 10:15       ` Juan Quintela
2015-07-13  9:12   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 08/42] migrate_init: Call from savevm Dr. David Alan Gilbert (git)
2015-06-17 12:18   ` Juan Quintela
2015-07-13  9:13   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 09/42] Rename save_live_complete to save_live_complete_precopy Dr. David Alan Gilbert (git)
2015-06-17 12:20   ` Juan Quintela
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 10/42] Return path: Open a return path on QEMUFile for sockets Dr. David Alan Gilbert (git)
2015-06-17 12:23   ` Juan Quintela
2015-06-17 17:07     ` Dr. David Alan Gilbert
2015-07-13 10:12   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 11/42] Return path: socket_writev_buffer: Block even on non-blocking fd's Dr. David Alan Gilbert (git)
2015-06-17 12:28   ` Juan Quintela
2015-06-19 17:18     ` Dr. David Alan Gilbert
2015-07-13 12:37   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 12/42] Migration commands Dr. David Alan Gilbert (git)
2015-06-17 12:31   ` Juan Quintela
2015-06-19 17:38     ` Dr. David Alan Gilbert
2015-07-13 12:45   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 13/42] Return path: Control commands Dr. David Alan Gilbert (git)
2015-06-17 12:49   ` Juan Quintela
2015-06-23 18:57     ` Dr. David Alan Gilbert
2015-07-13 12:55   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 14/42] Return path: Send responses from destination to source Dr. David Alan Gilbert (git)
2015-06-17 16:30   ` Juan Quintela
2015-06-19 18:42     ` Dr. David Alan Gilbert
2015-07-01  9:29       ` Juan Quintela
2015-08-06 12:18         ` Dr. David Alan Gilbert
2015-07-15  7:31   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 15/42] Return path: Source handling of return path Dr. David Alan Gilbert (git)
2015-07-13 10:29   ` Juan Quintela
2015-08-18 10:23     ` Dr. David Alan Gilbert
2015-07-15  7:50   ` Amit Shah
2015-07-16 11:32     ` Dr. David Alan Gilbert
2015-08-05  8:06   ` zhanghailiang
2015-08-18 10:45     ` Dr. David Alan Gilbert
2015-08-18 11:29       ` zhanghailiang
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 16/42] Rework loadvm path for subloops Dr. David Alan Gilbert (git)
2015-07-13 10:33   ` Juan Quintela
2015-07-15  9:34   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 17/42] Add migration-capability boolean for postcopy-ram Dr. David Alan Gilbert (git)
2015-06-16 15:43   ` Eric Blake
2015-06-16 15:58     ` Dr. David Alan Gilbert
2015-07-15  9:39       ` Amit Shah
2015-07-13 10:35   ` Juan Quintela
2015-07-15  9:40   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 18/42] Add wrappers and handlers for sending/receiving the postcopy-ram migration messages Dr. David Alan Gilbert (git)
2015-07-13 11:02   ` Juan Quintela
2015-07-20 10:13     ` Amit Shah
2015-08-26 14:48     ` Dr. David Alan Gilbert
2015-07-20 10:06   ` Amit Shah
2015-07-27  9:55     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 19/42] MIG_CMD_PACKAGED: Send a packaged chunk of migration stream Dr. David Alan Gilbert (git)
2015-07-13 11:07   ` Juan Quintela
2015-07-21  6:11   ` Amit Shah
2015-07-27 17:28     ` Dr. David Alan Gilbert
2015-08-04  5:27   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 20/42] Modify save_live_pending for postcopy Dr. David Alan Gilbert (git)
2015-07-13 11:12   ` Juan Quintela
2015-07-31 16:13     ` Dr. David Alan Gilbert
2015-07-21  6:17   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 21/42] postcopy: OS support test Dr. David Alan Gilbert (git)
2015-07-13 11:20   ` Juan Quintela
2015-07-13 16:31     ` Dr. David Alan Gilbert
2015-07-21  7:29   ` Amit Shah
2015-07-27 17:38     ` Dr. David Alan Gilbert
2015-08-04  5:28   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 22/42] migrate_start_postcopy: Command to trigger transition to postcopy Dr. David Alan Gilbert (git)
2015-07-13 11:23   ` Juan Quintela
2015-07-13 17:13     ` Dr. David Alan Gilbert
2015-07-13 18:07       ` Juan Quintela
2015-07-21  7:40         ` Amit Shah
2015-09-24  9:59           ` Dr. David Alan Gilbert
2015-09-24 14:20         ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 23/42] MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state Dr. David Alan Gilbert (git)
2015-07-13 11:27   ` Juan Quintela
2015-07-13 15:53     ` Dr. David Alan Gilbert
2015-07-13 16:26       ` Juan Quintela
2015-07-13 16:48         ` Dr. David Alan Gilbert
2015-07-13 18:05           ` Juan Quintela
2015-07-21 10:33   ` Amit Shah
2015-09-23 17:04     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 24/42] Add qemu_savevm_state_complete_postcopy Dr. David Alan Gilbert (git)
2015-07-13 11:35   ` Juan Quintela
2015-07-13 15:33     ` Dr. David Alan Gilbert
2015-07-21 10:42   ` Amit Shah
2015-07-27 17:58     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 25/42] Postcopy: Maintain sentmap and calculate discard Dr. David Alan Gilbert (git)
2015-07-13 11:47   ` Juan Quintela
2015-09-15 17:01     ` Dr. David Alan Gilbert
2015-07-21 11:36   ` Amit Shah
2015-07-31 16:51     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 26/42] postcopy: Incoming initialisation Dr. David Alan Gilbert (git)
2015-07-13 12:04   ` Juan Quintela
2015-09-23 19:06     ` Dr. David Alan Gilbert
2015-07-22  6:19   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 27/42] postcopy: ram_enable_notify to switch on userfault Dr. David Alan Gilbert (git)
2015-07-13 12:10   ` Juan Quintela
2015-07-13 17:36     ` Dr. David Alan Gilbert
2015-07-23  5:22   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 28/42] Postcopy: Postcopy startup in migration thread Dr. David Alan Gilbert (git)
2015-07-13 12:56   ` Juan Quintela
2015-07-13 17:56     ` Dr. David Alan Gilbert
2015-07-13 18:09       ` Juan Quintela
2015-09-23 17:56         ` Dr. David Alan Gilbert
2015-07-23  5:53       ` Amit Shah
2015-07-23  5:55   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 29/42] Postcopy end in migration_thread Dr. David Alan Gilbert (git)
2015-07-13 13:15   ` Juan Quintela
2015-07-23  6:41     ` Amit Shah
2015-08-04 11:31     ` Dr. David Alan Gilbert
2015-07-23  6:41   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 30/42] Page request: Add MIG_RP_MSG_REQ_PAGES reverse command Dr. David Alan Gilbert (git)
2015-07-13 13:24   ` Juan Quintela
2015-08-06 14:15     ` Dr. David Alan Gilbert
2015-07-23  6:50   ` Amit Shah
2015-08-06 14:21     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request Dr. David Alan Gilbert (git)
2015-07-14  9:18   ` Juan Quintela
2015-08-06 10:45     ` Dr. David Alan Gilbert
2015-10-20 10:29       ` Juan Quintela
2015-07-23 12:23   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue Dr. David Alan Gilbert (git)
2015-07-14  9:40   ` Juan Quintela
2015-09-16 18:36     ` Dr. David Alan Gilbert
2015-07-27  6:05   ` Amit Shah
2015-09-16 18:48     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers Dr. David Alan Gilbert (git)
2015-07-14 10:05   ` Juan Quintela
2015-07-27  6:11     ` Amit Shah
2015-09-23 16:45     ` Dr. David Alan Gilbert
2015-07-27  6:11   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration Dr. David Alan Gilbert (git)
2015-07-14 12:34   ` Juan Quintela
2015-07-17 17:31     ` Dr. David Alan Gilbert
2015-07-27  7:39   ` Amit Shah
2015-08-06 11:22     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy Dr. David Alan Gilbert (git)
2015-07-14 12:36   ` Juan Quintela
2015-07-14 13:13     ` Dr. David Alan Gilbert
2015-07-27  7:43   ` Amit Shah
2015-07-31  9:50     ` Dr. David Alan Gilbert
2015-08-04  5:46       ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 36/42] Host page!=target page: Cleanup bitmaps Dr. David Alan Gilbert (git)
2015-07-14 15:01   ` Juan Quintela
2015-07-31 15:53     ` Dr. David Alan Gilbert
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 37/42] Postcopy; Handle userfault requests Dr. David Alan Gilbert (git)
2015-07-14 15:10   ` Juan Quintela
2015-07-14 15:15     ` Dr. David Alan Gilbert
2015-07-14 15:25       ` Juan Quintela
2015-07-27 14:29   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 38/42] Start up a postcopy/listener thread ready for incoming page data Dr. David Alan Gilbert (git)
2015-07-14 15:12   ` Juan Quintela
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 39/42] postcopy: Wire up loadvm_postcopy_handle_ commands Dr. David Alan Gilbert (git)
2015-07-14 15:14   ` Juan Quintela
2015-07-28  5:53   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 40/42] End of migration for postcopy Dr. David Alan Gilbert (git)
2015-07-14 15:15   ` Juan Quintela
2015-07-28  5:55   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 41/42] Disable mlock around incoming postcopy Dr. David Alan Gilbert (git)
2015-07-14 15:22   ` Juan Quintela
2015-07-28  6:02     ` Amit Shah
2015-07-28 11:32       ` Juan Quintela
2015-08-06 14:55         ` Dr. David Alan Gilbert
2015-08-07  3:05           ` zhanghailiang
2015-09-24 10:36     ` Dr. David Alan Gilbert
2015-07-28  6:02   ` Amit Shah
2015-06-16 10:26 ` [Qemu-devel] [PATCH v7 42/42] Inhibit ballooning during postcopy Dr. David Alan Gilbert (git)
2015-07-14 15:24   ` Juan Quintela
2015-07-28  6:15   ` Amit Shah
2015-07-28  9:08     ` Dr. David Alan Gilbert
2015-07-28 10:01       ` Amit Shah
2015-07-28 11:16         ` Dr. David Alan Gilbert
2015-07-28  6:21 ` [Qemu-devel] [PATCH v7 00/42] Postcopy implementation Amit Shah

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.