All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Xen-devel <xen-devel@lists.xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Wei Liu <wei.liu2@citrix.com>
Subject: [PATCH v4 14/29] tools/python: Other migration infrastructure
Date: Tue, 14 Jul 2015 11:59:29 +0100	[thread overview]
Message-ID: <1436871584-6522-15-git-send-email-andrew.cooper3@citrix.com> (raw)
In-Reply-To: <1436871584-6522-1-git-send-email-andrew.cooper3@citrix.com>

Contains:
 * Reverse-engineered notes of the legacy format from xg_save_restore.h
 * Python implementation of the legacy format
 * Public HVM Params used in the legacy stream
 * XL header format

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>

---
New in v2 - removes various many magic numbers from subsequent scripts
---
 tools/python/xen/migration/legacy.py |  279 ++++++++++++++++++++++++++++++++++
 tools/python/xen/migration/public.py |   21 +++
 tools/python/xen/migration/xl.py     |   12 ++
 3 files changed, 312 insertions(+)
 create mode 100644 tools/python/xen/migration/legacy.py
 create mode 100644 tools/python/xen/migration/public.py
 create mode 100644 tools/python/xen/migration/xl.py

diff --git a/tools/python/xen/migration/legacy.py b/tools/python/xen/migration/legacy.py
new file mode 100644
index 0000000..2f2240a
--- /dev/null
+++ b/tools/python/xen/migration/legacy.py
@@ -0,0 +1,279 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Libxc legacy migration streams
+
+Documentation and record structures for legacy migration
+"""
+
+"""
+SAVE/RESTORE/MIGRATE PROTOCOL
+=============================
+
+The general form of a stream of chunks is a header followed by a
+body consisting of a variable number of chunks (terminated by a
+chunk with type 0) followed by a trailer.
+
+For a rolling/checkpoint (e.g. remus) migration then the body and
+trailer phases can be repeated until an external event
+(e.g. failure) causes the process to terminate and commit to the
+most recent complete checkpoint.
+
+HEADER
+------
+
+unsigned long        : p2m_size
+
+extended-info (PV-only, optional):
+
+  If first unsigned long == ~0UL then extended info is present,
+  otherwise unsigned long is part of p2m. Note that p2m_size above
+  does not include the length of the extended info.
+
+  extended-info:
+
+    unsigned long    : signature == ~0UL
+    uint32_t	        : number of bytes remaining in extended-info
+
+    1 or more extended-info blocks of form:
+    char[4]          : block identifier
+    uint32_t         : block data size
+    bytes            : block data
+
+    defined extended-info blocks:
+    "vcpu"		: VCPU context info containing vcpu_guest_context_t.
+                       The precise variant of the context structure
+                       (e.g. 32 vs 64 bit) is distinguished by
+                       the block size.
+    "extv"           : Presence indicates use of extended VCPU context in
+                       tail, data size is 0.
+
+p2m (PV-only):
+
+  consists of p2m_size bytes comprising an array of xen_pfn_t sized entries.
+
+BODY PHASE - Format A (for live migration or Remus without compression)
+----------
+
+A series of chunks with a common header:
+  int              : chunk type
+
+If the chunk type is +ve then chunk contains guest memory data, and the
+type contains the number of pages in the batch:
+
+    unsigned long[]  : PFN array, length == number of pages in batch
+                       Each entry consists of XEN_DOMCTL_PFINFO_*
+                       in bits 31-28 and the PFN number in bits 27-0.
+    page data        : PAGE_SIZE bytes for each page marked present in PFN
+                       array
+
+If the chunk type is -ve then chunk consists of one of a number of
+metadata types.  See definitions of XC_SAVE_ID_* below.
+
+If chunk type is 0 then body phase is complete.
+
+
+BODY PHASE - Format B (for Remus with compression)
+----------
+
+A series of chunks with a common header:
+  int              : chunk type
+
+If the chunk type is +ve then chunk contains array of PFNs corresponding
+to guest memory and type contains the number of PFNs in the batch:
+
+    unsigned long[]  : PFN array, length == number of pages in batch
+                       Each entry consists of XEN_DOMCTL_PFINFO_*
+                       in bits 31-28 and the PFN number in bits 27-0.
+
+If the chunk type is -ve then chunk consists of one of a number of
+metadata types.  See definitions of XC_SAVE_ID_* below.
+
+If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the
+chunk consists of compressed page data, in the following format:
+
+    unsigned long        : Size of the compressed chunk to follow
+    compressed data :      variable length data of size indicated above.
+                           This chunk consists of compressed page data.
+                           The number of pages in one chunk depends on
+                           the amount of space available in the sender's
+                           output buffer.
+
+Format of compressed data:
+  compressed_data = <deltas>*
+  delta           = <marker, run*>
+  marker          = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker]
+  RUNFLAG         = 0
+  SKIPFLAG        = 1 << 7
+  RUNLEN          = 7-bit unsigned value indicating number of WORDS in the run
+  run             = string of bytes of length sizeof(WORD) * RUNLEN
+
+   If marker contains RUNFLAG, then RUNLEN * sizeof(WORD) bytes of data following
+  the marker is copied into the target page at the appropriate offset indicated by
+  the offset_ptr
+   If marker contains SKIPFLAG, then the offset_ptr is advanced
+  by RUNLEN * sizeof(WORD).
+
+If chunk type is 0 then body phase is complete.
+
+There can be one or more chunks with type XC_SAVE_ID_COMPRESSED_DATA,
+containing compressed pages. The compressed chunks are collated to form
+one single compressed chunk for the entire iteration. The number of pages
+present in this final compressed chunk will be equal to the total number
+of valid PFNs specified by the +ve chunks.
+
+At the sender side, compressed pages are inserted into the output stream
+in the same order as they would have been if compression logic was absent.
+
+Until last iteration, the BODY is sent in Format A, to maintain live
+migration compatibility with receivers of older Xen versions.
+At the last iteration, if Remus compression was enabled, the sender sends
+a trigger, XC_SAVE_ID_ENABLE_COMPRESSION to tell the receiver to parse the
+BODY in Format B from the next iteration onwards.
+
+An example sequence of chunks received in Format B:
+    +16                              +ve chunk
+    unsigned long[16]                PFN array
+    +100                             +ve chunk
+    unsigned long[100]               PFN array
+    +50                              +ve chunk
+    unsigned long[50]                PFN array
+
+    XC_SAVE_ID_COMPRESSED_DATA       TAG
+      N                              Length of compressed data
+      N bytes of DATA                Decompresses to 166 pages
+
+    XC_SAVE_ID_*                     other xc save chunks
+    0                                END BODY TAG
+
+Corner case with checkpoint compression:
+    At sender side, after pausing the domain, dirty pages are usually
+  copied out to a temporary buffer. After the domain is resumed,
+  compression is done and the compressed chunk(s) are sent, followed by
+  other XC_SAVE_ID_* chunks.
+    If the temporary buffer gets full while scanning for dirty pages,
+  the sender stops buffering of dirty pages, compresses the temporary
+  buffer and sends the compressed data with XC_SAVE_ID_COMPRESSED_DATA.
+  The sender then resumes the buffering of dirty pages and continues
+  scanning for the dirty pages.
+    For e.g., assume that the temporary buffer can hold 4096 pages and
+  there are 5000 dirty pages. The following is the sequence of chunks
+  that the receiver will see:
+
+    +1024                       +ve chunk
+    unsigned long[1024]         PFN array
+    +1024                       +ve chunk
+    unsigned long[1024]         PFN array
+    +1024                       +ve chunk
+    unsigned long[1024]         PFN array
+    +1024                       +ve chunk
+    unsigned long[1024]         PFN array
+
+    XC_SAVE_ID_COMPRESSED_DATA  TAG
+     N                          Length of compressed data
+     N bytes of DATA            Decompresses to 4096 pages
+
+    +4                          +ve chunk
+    unsigned long[4]            PFN array
+
+    XC_SAVE_ID_COMPRESSED_DATA  TAG
+     M                          Length of compressed data
+     M bytes of DATA            Decompresses to 4 pages
+
+    XC_SAVE_ID_*                other xc save chunks
+    0                           END BODY TAG
+
+    In other words, XC_SAVE_ID_COMPRESSED_DATA can be interleaved with
+  +ve chunks arbitrarily. But at the receiver end, the following condition
+  always holds true until the end of BODY PHASE:
+   num(PFN entries +ve chunks) >= num(pages received in compressed form)
+
+TAIL PHASE
+----------
+
+Content differs for PV and HVM guests.
+
+HVM TAIL:
+
+ "Magic" pages:
+    uint64_t         : I/O req PFN
+    uint64_t         : Buffered I/O req PFN
+    uint64_t         : Store PFN
+ Xen HVM Context:
+    uint32_t         : Length of context in bytes
+    bytes            : Context data
+ Qemu context:
+    char[21]         : Signature:
+      "QemuDeviceModelRecord" : Read Qemu save data until EOF
+      "DeviceModelRecord0002" : uint32_t length field followed by that many
+                                bytes of Qemu save data
+      "RemusDeviceModelState" : Currently the same as "DeviceModelRecord0002".
+
+PV TAIL:
+
+ Unmapped PFN list   : list of all the PFNs that were not in map at the close
+    unsigned int     : Number of unmapped pages
+    unsigned long[]  : PFNs of unmapped pages
+
+ VCPU context data   : A series of VCPU records, one per present VCPU
+                       Maximum and present map supplied in XC_SAVE_ID_VCPUINFO
+    bytes:           : VCPU context structure. Size is determined by size
+                       provided in extended-info header
+    bytes[128]       : Extended VCPU context (present IFF "extv" block
+                       present in extended-info header)
+
+ Shared Info Page    : 4096 bytes of shared info page
+"""
+
+CHUNK_end                       = 0
+CHUNK_enable_verify_mode        = -1
+CHUNK_vcpu_info                 = -2
+CHUNK_hvm_ident_pt              = -3
+CHUNK_hvm_vm86_tss              = -4
+CHUNK_tmem                      = -5
+CHUNK_tmem_extra                = -6
+CHUNK_tsc_info                  = -7
+CHUNK_hvm_console_pfn           = -8
+CHUNK_last_checkpoint           = -9
+CHUNK_hvm_acpi_ioports_location = -10
+CHUNK_hvm_viridian              = -11
+CHUNK_compressed_data           = -12
+CHUNK_enable_compression        = -13
+CHUNK_hvm_generation_id_addr    = -14
+CHUNK_hvm_paging_ring_pfn       = -15
+CHUNK_hvm_monitor_ring_pfn      = -16
+CHUNK_hvm_sharing_ring_pfn      = -17
+CHUNK_toolstack                 = -18
+CHUNK_hvm_ioreq_server_pfn      = -19
+CHUNK_hvm_nr_ioreq_server_pages = -20
+
+chunk_type_to_str = {
+    CHUNK_end                       : "end",
+    CHUNK_enable_verify_mode        : "enable_verify_mode",
+    CHUNK_vcpu_info                 : "vcpu_info",
+    CHUNK_hvm_ident_pt              : "hvm_ident_pt",
+    CHUNK_hvm_vm86_tss              : "hvm_vm86_tss",
+    CHUNK_tmem                      : "tmem",
+    CHUNK_tmem_extra                : "tmem_extra",
+    CHUNK_tsc_info                  : "tsc_info",
+    CHUNK_hvm_console_pfn           : "hvm_console_pfn",
+    CHUNK_last_checkpoint           : "last_checkpoint",
+    CHUNK_hvm_acpi_ioports_location : "hvm_acpi_ioports_location",
+    CHUNK_hvm_viridian              : "hvm_viridian",
+    CHUNK_compressed_data           : "compressed_data",
+    CHUNK_enable_compression        : "enable_compression",
+    CHUNK_hvm_generation_id_addr    : "hvm_generation_id_addr",
+    CHUNK_hvm_paging_ring_pfn       : "hvm_paging_ring_pfn",
+    CHUNK_hvm_monitor_ring_pfn      : "hvm_monitor_ring_pfn",
+    CHUNK_hvm_sharing_ring_pfn      : "hvm_sharing_ring_pfn",
+    CHUNK_toolstack                 : "toolstack",
+    CHUNK_hvm_ioreq_server_pfn      : "hvm_ioreq_server_pfn",
+    CHUNK_hvm_nr_ioreq_server_pages : "hvm_nr_ioreq_server_pages",
+}
+
+# Up to 1024 pages (4MB) at a time
+MAX_BATCH = 1024
+
+# Maximum #VCPUs currently supported for save/restore
+MAX_VCPU_ID = 4095
diff --git a/tools/python/xen/migration/public.py b/tools/python/xen/migration/public.py
new file mode 100644
index 0000000..fab2f84
--- /dev/null
+++ b/tools/python/xen/migration/public.py
@@ -0,0 +1,21 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Xen public ABI constants, used in migration
+"""
+
+HVM_PARAM_STORE_PFN             = 1
+HVM_PARAM_IOREQ_PFN             = 5
+HVM_PARAM_BUFIOREQ_PFN          = 6
+HVM_PARAM_VIRIDIAN              = 9
+HVM_PARAM_IDENT_PT              = 12
+HVM_PARAM_VM86_TSS              = 15
+HVM_PARAM_CONSOLE_PFN           = 17
+HVM_PARAM_ACPI_IOPORTS_LOCATION = 19
+HVM_PARAM_PAGING_RING_PFN       = 27
+HVM_PARAM_MONITOR_RING_PFN      = 28
+HVM_PARAM_SHARING_RING_PFN      = 29
+HVM_PARAM_IOREQ_SERVER_PFN      = 32
+HVM_PARAM_NR_IOREQ_SERVER_PAGES = 33
+HVM_PARAM_VM_GENERATION_ID_ADDR = 34
diff --git a/tools/python/xen/migration/xl.py b/tools/python/xen/migration/xl.py
new file mode 100644
index 0000000..978e744
--- /dev/null
+++ b/tools/python/xen/migration/xl.py
@@ -0,0 +1,12 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+XL migration stream format
+"""
+
+MAGIC = "Xen saved domain, xl format\n \0 \r"
+
+HEADER_FORMAT = "=IIII"
+
+MANDATORY_FLAG_STREAMV2 = 2
-- 
1.7.10.4

  parent reply	other threads:[~2015-07-14 10:59 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-14 10:59 [PATCH v4 00/27] Libxl migration v2 Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 01/29] bsd-sys-queue-h-seddery: Massage `offsetof' Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 02/29] tools/libxc: Always compile the compat qemu variables into xc_sr_context Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 03/29] tools/libxl: Introduce ROUNDUP() Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 04/29] tools/libxl: Introduce libxl__kill() Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 05/29] tools/libxl: Stash all restore parameters in domain_create_state Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 06/29] tools/libxl: Split libxl__domain_create_state.restore_fd in two Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 07/29] tools/libxl: Extra management APIs for the save helper Andrew Cooper
2015-07-14 13:23   ` Ian Jackson
2015-07-14 10:59 ` [PATCH v4 08/29] tools/libxl: Add save_helper_state pointers to libxl__xc_domain_{save, restore}() Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 09/29] tools/libxl: Fix libxl__carefd_opened() to be more useful with an invalid fd Andrew Cooper
2015-07-14 13:39   ` Ian Jackson
2015-07-14 14:08     ` Ian Campbell
2015-07-14 10:59 ` [PATCH v4 10/29] tools/xl: Mandatory flag indicating the format of the migration stream Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 11/29] docs: Libxl migration v2 stream specification Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 12/29] tools/python: Libxc migration v2 infrastructure Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 13/29] tools/python: Libxl " Andrew Cooper
2015-07-14 10:59 ` Andrew Cooper [this message]
2015-07-14 10:59 ` [PATCH v4 15/29] tools/python: Verification utility for v2 stream spec compliance Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 16/29] tools/python: Conversion utility for legacy migration streams Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 17/29] tools/libxl: Migration v2 stream format Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 18/29] tools/libxl: Infrastructure for reading a libxl migration v2 stream Andrew Cooper
2015-07-14 13:32   ` Ian Jackson
2015-07-14 10:59 ` [PATCH v4 19/29] tools/libxl: Infrastructure to convert a legacy stream Andrew Cooper
2015-07-14 13:33   ` Ian Jackson
2015-07-14 10:59 ` [PATCH v4 20/29] tools/libxl: Convert a legacy stream if needed Andrew Cooper
2015-07-14 13:37   ` Ian Jackson
2015-07-14 10:59 ` [PATCH v4 21/29] tools/libxc+libxl+xl: Restore v2 streams Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 22/29] tools/libxl: Infrastructure for writing a v2 stream Andrew Cooper
2015-07-14 13:40   ` Ian Jackson
2015-07-14 10:59 ` [PATCH v4 23/29] tools/libxc+libxl+xl: Save v2 streams Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 24/29] docs/libxl: Introduce CHECKPOINT_END to support migration v2 remus streams Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 25/29] tools/libxl: Write checkpoint records into the stream Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 26/29] tools/libx{c, l}: Introduce restore_callbacks.checkpoint() Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 27/29] tools/libxl: Handle checkpoint records in a libxl migration v2 stream Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 28/29] tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc Andrew Cooper
2015-07-14 10:59 ` [PATCH v4 29/29] tools/libxl: Drop all knowledge of toolstack callbacks Andrew Cooper
2015-07-15 10:21 ` [PATCH v4 00/27] Libxl migration v2 Wei Liu
2015-07-15 10:25   ` Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1436871584-6522-15-git-send-email-andrew.cooper3@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.