All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
@ 2015-06-08 15:21 Vladimir Sementsov-Ogievskiy
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
                   ` (10 more replies)
  0 siblings, 11 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, vsementsov, stefanha, pbonzini, den, jsnow

v2:
 - rebase on my 'Dirty bitmaps migration' series
 - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for
   testing like with dirty bitmaps migration
 - autoclean features

v1:

The bitmaps are saved into qcow2 file format. It provides both
'internal' and 'external' dirty bitmaps feature:
 - for qcow2 drives we can store bitmaps in the same file
 - for other formats we can store bitmaps in the separate qcow2 file

QCow2 header is extended by fields 'nb_dirty_bitmaps' and
'dirty_bitmaps_offset' like with snapshots.

Proposed command line syntax is the following:

-dirty-bitmap [option1=val1][,option2=val2]...
    Available options are:
    name         The name for the bitmap (necessary).

    file         The file to load the bitmap from.

    file_id      When specified with 'file' option, then this file will
                 be available through this id for other -dirty-bitmap
                 options when specified without 'file' option, then it
                 is a reference to 'file', specified with another
                 -dirty-bitmap option, and it will be used to load the
                 bitmap from.

    drive        The drive to bind the bitmap to. It should be specified
                 as 'id' suboption of one of -drive options. If nor
                 'file' neither 'file_id' are specified, then the bitmap
                 will be loaded from that drive (internal dirty bitmap).

    granularity  The granularity for the bitmap. Not necessary, the
                 default value may be used.

    enabled      on|off. Default is 'on'. Disabled bitmaps are not
                 changing regardless of writes to corresponding drive.

Examples:

qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk
qemu -drive file=a.raw,id=disk \
     -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off

Vladimir Sementsov-Ogievskiy (8):
  spec: add qcow2-dirty-bitmaps specification
  qcow2: add dirty-bitmaps feature
  block: store persistent dirty bitmaps
  block: add bdrv_load_dirty_bitmap
  qcow2: add qcow2_dirty_bitmap_delete_all
  qcow2: add autoclear bit for dirty bitmaps
  qemu: command line option for dirty bitmaps
  iotests: test internal persistent dirty bitmap

 block.c                       |  82 +++++++
 block/Makefile.objs           |   2 +-
 block/qcow2-dirty-bitmap.c    | 537 ++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c                 |  69 +++++-
 block/qcow2.h                 |  61 +++++
 blockdev.c                    |  38 +++
 docs/specs/qcow2.txt          |  66 ++++++
 include/block/block.h         |   9 +
 include/block/block_int.h     |  10 +
 include/sysemu/blockdev.h     |   1 +
 include/sysemu/sysemu.h       |   1 +
 qemu-options.hx               |  37 +++
 tests/qemu-iotests/118        |  83 +++++++
 tests/qemu-iotests/118.out    |   5 +
 tests/qemu-iotests/group      |   1 +
 tests/qemu-iotests/iotests.py |   6 +
 vl.c                          | 100 ++++++++
 17 files changed, 1105 insertions(+), 3 deletions(-)
 create mode 100644 block/qcow2-dirty-bitmap.c
 create mode 100755 tests/qemu-iotests/118
 create mode 100644 tests/qemu-iotests/118.out

-- 
1.9.1

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
@ 2015-06-08 15:21 ` Vladimir Sementsov-Ogievskiy
  2015-06-09 16:01   ` John Snow
                     ` (5 more replies)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  10 siblings, 6 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, vsementsov, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

Persistent dirty bitmaps will be saved into qcow2 files. It may be used
as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
other drives (there may be qcow2 file with zero disk size but with
several dirty bitmaps for other drives).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 121dfc8..0fffba2 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
                         0x00000000 - End of the header extension area
                         0xE2792ACA - Backing file format name
                         0x6803f857 - Feature name table
+                        0x23852875 - Dirty bitmaps
                         other      - Unknown header extension, can be safely
                                      ignored
 
@@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
                     terminated if it has full length)
 
 
+== Dirty bitmaps ==
+
+Dirty bitmaps is an optional header extension. It provides a possibility of
+storing dirty bitmaps in qcow2 image. The fields are:
+
+          0 -  3:  nb_dirty_bitmaps
+                   Number of dirty bitmaps contained in the image
+
+          4 - 11:  dirty_bitmaps_offset
+                   Offset into the image file at which the dirty bitmaps table
+                   starts. Must be aligned to a cluster boundary.
+
+
 == Host cluster management ==
 
 qcow2 manages the allocation of host clusters by maintaining a reference count
@@ -360,3 +374,55 @@ Snapshot table entry:
 
         variable:   Padding to round up the snapshot table entry size to the
                     next multiple of 8.
+
+
+== Dirty bitmaps ==
+
+The feature supports storing several dirty bitmaps in the qcow2 file.
+
+=== Cluster mapping ===
+
+Dirty bitmaps are stored using a ONE-level structure for the mapping of
+bitmaps to host clusters. There is only an L1 table.
+
+The L1 table has a variable size (stored in the Bitmap table entry) and may
+use multiple clusters, however it must be contiguous in the image file.
+
+Given an offset into the bitmap, the offset into the image file can be
+obtained as follows:
+
+    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
+
+L1 table entry:
+
+    Bit  0 -  61:   Standard cluster descriptor
+
+        62 -  63:   Reserved
+
+=== Bitmap table ===
+
+A directory of all bitmaps is stored in the bitmap table, a contiguous area in
+the image file, whose starting offset and length are given by the header fields
+dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
+variable length, depending on the length of name and extra data.
+
+Bitmap table entry:
+
+    Byte 0 -  7:    Offset into the image file at which the L1 table for the
+                    bitmap starts. Must be aligned to a cluster boundary.
+
+         8 - 11:    Number of entries in the L1 table of the bitmap
+
+        12 - 15:    Bitmap granularity in bytes
+
+        16 - 23:    Bitmap size in sectors
+
+        24 - 25:    Size of the bitmap name
+
+        variable:   The name of the bitmap (not null terminated)
+
+        variable:   Padding to round up the bitmap table entry size to the
+                    next multiple of 8.
+
+The fields "size", "granularity" and "name" are corresponding with the fields
+in struct BdrvDirtyBitmap.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
@ 2015-06-08 15:21 ` Vladimir Sementsov-Ogievskiy
  2015-06-09 16:52   ` Stefan Hajnoczi
                     ` (4 more replies)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 3/8] block: store persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  10 siblings, 5 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, vsementsov, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

Adds dirty-bitmaps feature to qcow2 format as specified in
docs/specs/qcow2.txt

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/Makefile.objs        |   2 +-
 block/qcow2-dirty-bitmap.c | 503 +++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c              |  56 +++++
 block/qcow2.h              |  50 +++++
 include/block/block_int.h  |  10 +
 5 files changed, 620 insertions(+), 1 deletion(-)
 create mode 100644 block/qcow2-dirty-bitmap.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 0d8c2a4..bff12b4 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,5 @@
 block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o qcow2-dirty-bitmap.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
new file mode 100644
index 0000000..bc0167c
--- /dev/null
+++ b/block/qcow2-dirty-bitmap.c
@@ -0,0 +1,503 @@
+/*
+ * Dirty bitmpas for the QCOW version 2 format
+ *
+ * Copyright (c) 2014-2015 Vladimir Sementsov-Ogievskiy
+ *
+ * This file is derived from qcow2-snapshot.c, original copyright:
+ * Copyright (c) 2004-2006 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/qcow2.h"
+
+void qcow2_free_dirty_bitmaps(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
+        g_free(s->dirty_bitmaps[i].name);
+    }
+    g_free(s->dirty_bitmaps);
+    s->dirty_bitmaps = NULL;
+    s->nb_dirty_bitmaps = 0;
+}
+
+int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowDirtyBitmapHeader h;
+    QCowDirtyBitmap *bm;
+    int i, name_size;
+    int64_t offset;
+    int ret;
+
+    if (!s->nb_dirty_bitmaps) {
+        s->dirty_bitmaps = NULL;
+        s->dirty_bitmaps_size = 0;
+        return 0;
+    }
+
+    offset = s->dirty_bitmaps_offset;
+    s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);
+
+    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
+        /* Read statically sized part of the dirty_bitmap header */
+        offset = align_offset(offset, 8);
+        ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
+        if (ret < 0) {
+            goto fail;
+        }
+
+        offset += sizeof(h);
+        bm = s->dirty_bitmaps + i;
+        bm->l1_table_offset = be64_to_cpu(h.l1_table_offset);
+        bm->l1_size = be32_to_cpu(h.l1_size);
+        bm->bitmap_granularity = be32_to_cpu(h.bitmap_granularity);
+        bm->bitmap_size = be64_to_cpu(h.bitmap_size);
+
+        name_size = be16_to_cpu(h.name_size);
+
+        /* Read dirty_bitmap name */
+        bm->name = g_malloc(name_size + 1);
+        ret = bdrv_pread(bs->file, offset, bm->name, name_size);
+        if (ret < 0) {
+            goto fail;
+        }
+        offset += name_size;
+        bm->name[name_size] = '\0';
+
+        if (offset - s->dirty_bitmaps_offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
+            ret = -EFBIG;
+            goto fail;
+        }
+    }
+
+    assert(offset - s->dirty_bitmaps_offset <= INT_MAX);
+    s->dirty_bitmaps_size = offset - s->dirty_bitmaps_offset;
+    return 0;
+
+fail:
+    qcow2_free_dirty_bitmaps(bs);
+    return ret;
+}
+
+/* Add at the end of the file a new table of dirty bitmaps */
+static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowDirtyBitmap *bm;
+    QCowDirtyBitmapHeader h;
+    int i, name_size, dirty_bitmaps_size;
+    int64_t offset, dirty_bitmaps_offset = 0;
+    int ret;
+
+    int old_dirty_bitmaps_size = s->dirty_bitmaps_size;
+    int64_t old_dirty_bitmaps_offset = s->dirty_bitmaps_offset;
+
+    /* Compute the size of the dirty bitmaps table */
+    offset = 0;
+    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
+        bm = s->dirty_bitmaps + i;
+        offset = align_offset(offset, 8);
+        offset += sizeof(h);
+        offset += strlen(bm->name);
+
+        if (offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
+            ret = -EFBIG;
+            goto fail;
+        }
+    }
+
+    assert(offset <= INT_MAX);
+    dirty_bitmaps_size = offset;
+
+    /* Allocate space for the new dirty bitmap table */
+    dirty_bitmaps_offset = qcow2_alloc_clusters(bs, dirty_bitmaps_size);
+    offset = dirty_bitmaps_offset;
+    if (offset < 0) {
+        ret = offset;
+        goto fail;
+    }
+    ret = bdrv_flush(bs);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* The dirty bitmap table position has not yet been updated, so these
+     * clusters must indeed be completely free */
+    ret = qcow2_pre_write_overlap_check(bs, 0, offset, dirty_bitmaps_size);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* Write all dirty bitmaps to the new table */
+    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
+        bm = s->dirty_bitmaps + i;
+        memset(&h, 0, sizeof(h));
+        h.l1_table_offset = cpu_to_be64(bm->l1_table_offset);
+        h.l1_size = cpu_to_be32(bm->l1_size);
+        h.bitmap_granularity = cpu_to_be32(bm->bitmap_granularity);
+        h.bitmap_size = cpu_to_be64(bm->bitmap_size);
+
+        name_size = strlen(bm->name);
+        assert(name_size <= UINT16_MAX);
+        h.name_size = cpu_to_be16(name_size);
+        offset = align_offset(offset, 8);
+
+        ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
+        if (ret < 0) {
+            goto fail;
+        }
+        offset += sizeof(h);
+
+        ret = bdrv_pwrite(bs->file, offset, bm->name, name_size);
+        if (ret < 0) {
+            goto fail;
+        }
+        offset += name_size;
+    }
+
+    /*
+     * Update the header extension to point to the new dirty bitmap table. This
+     * requires the new table and its refcounts to be stable on disk.
+     */
+    ret = bdrv_flush(bs);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    s->dirty_bitmaps_offset = dirty_bitmaps_offset;
+    s->dirty_bitmaps_size = dirty_bitmaps_size;
+    ret = qcow2_update_header(bs);
+    if (ret < 0) {
+        fprintf(stderr, "Could not update qcow2 header\n");
+        goto fail;
+    }
+
+    /* Free old dirty bitmap table */
+    qcow2_free_clusters(bs, old_dirty_bitmaps_offset, old_dirty_bitmaps_size,
+                        QCOW2_DISCARD_ALWAYS);
+    return 0;
+
+fail:
+    if (dirty_bitmaps_offset > 0) {
+        qcow2_free_clusters(bs, dirty_bitmaps_offset, dirty_bitmaps_size,
+                            QCOW2_DISCARD_ALWAYS);
+    }
+    return ret;
+}
+
+static int find_dirty_bitmap_by_name(BlockDriverState *bs,
+                                     const char *name)
+{
+    BDRVQcowState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
+        if (!strcmp(s->dirty_bitmaps[i].name, name)) {
+            return i;
+        }
+    }
+
+    return -1;
+}
+
+uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
+                            const char *name, uint64_t size,
+                            int granularity)
+{
+    BDRVQcowState *s = bs->opaque;
+    int i, dirty_bitmap_index, ret;
+    uint64_t offset;
+    QCowDirtyBitmap *bm;
+    uint64_t *l1_table;
+    uint8_t *buf;
+
+    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
+    if (dirty_bitmap_index < 0) {
+        return NULL;
+    }
+    bm = &s->dirty_bitmaps[dirty_bitmap_index];
+
+    if (size != bm->bitmap_size || granularity != bm->bitmap_granularity) {
+        return NULL;
+    }
+
+    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
+    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
+                     bm->l1_size * sizeof(uint64_t));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    buf = g_malloc0(bm->l1_size * s->cluster_size);
+    for (i = 0; i < bm->l1_size; ++i) {
+        offset = be64_to_cpu(l1_table[i]);
+        if (!(offset & 1)) {
+            ret = bdrv_pread(bs->file, offset, buf + i * s->cluster_size,
+                             s->cluster_size);
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+    }
+
+    g_free(l1_table);
+    return buf;
+
+fail:
+    g_free(l1_table);
+    return NULL;
+}
+
+int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
+                            const char *name, uint64_t size,
+                            int granularity)
+{
+    BDRVQcowState *s = bs->opaque;
+    int cl_size = s->cluster_size;
+    int i, dirty_bitmap_index, ret = 0, n;
+    uint64_t *l1_table;
+    QCowDirtyBitmap *bm;
+    uint64_t buf_size;
+    uint8_t *p;
+    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
+
+    /* find/create dirty bitmap */
+    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
+    if (dirty_bitmap_index >= 0) {
+        bm = s->dirty_bitmaps + dirty_bitmap_index;
+
+        if (size != bm->bitmap_size ||
+            granularity != bm->bitmap_granularity) {
+            qcow2_dirty_bitmap_delete(bs, name, NULL);
+            dirty_bitmap_index = -1;
+        }
+    }
+    if (dirty_bitmap_index < 0) {
+        qcow2_dirty_bitmap_create(bs, name, size, granularity);
+        dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
+    }
+    bm = s->dirty_bitmaps + dirty_bitmap_index;
+
+    /* read l1 table */
+    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
+    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
+                     bm->l1_size * sizeof(uint64_t));
+    if (ret < 0) {
+        goto finish;
+    }
+
+    buf_size = (((size - 1) / sector_granularity) >> 3) + 1;
+    buf_size = align_offset(buf_size, 4);
+    n = buf_size / cl_size;
+    p = buf;
+    for (i = 0; i < bm->l1_size; ++i) {
+        uint64_t addr = be64_to_cpu(l1_table[i]) & ~511;
+        int write_size = (i == n ? (buf_size % cl_size) : cl_size);
+
+        if (buffer_is_zero(p, write_size)) {
+            if (addr) {
+                qcow2_free_clusters(bs, addr, cl_size,
+                                    QCOW2_DISCARD_ALWAYS);
+            }
+            l1_table[i] = cpu_to_be64(1);
+        } else {
+            if (!addr) {
+                addr = qcow2_alloc_clusters(bs, cl_size);
+                l1_table[i] = cpu_to_be64(addr);
+            }
+
+            ret = bdrv_pwrite(bs->file, addr, p, write_size);
+            if (ret < 0) {
+                goto finish;
+            }
+        }
+
+        p += cl_size;
+    }
+
+    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
+                      bm->l1_size * sizeof(uint64_t));
+    if (ret < 0) {
+        goto finish;
+    }
+
+finish:
+    g_free(l1_table);
+    return ret;
+}
+/* if no id is provided, a new one is constructed */
+int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
+                              uint64_t size, int granularity)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowDirtyBitmap *new_dirty_bitmap_list = NULL;
+    QCowDirtyBitmap *old_dirty_bitmap_list = NULL;
+    QCowDirtyBitmap sn1, *bm = &sn1;
+    int i, ret;
+    uint64_t *l1_table = NULL;
+    int64_t l1_table_offset;
+    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
+
+    if (s->nb_dirty_bitmaps >= QCOW_MAX_DIRTY_BITMAPS) {
+        return -EFBIG;
+    }
+
+    memset(bm, 0, sizeof(*bm));
+
+    /* Check that the ID is unique */
+    if (find_dirty_bitmap_by_name(bs, name) >= 0) {
+        return -EEXIST;
+    }
+
+    /* Populate bm with passed data */
+    bm->name = g_strdup(name);
+    bm->bitmap_granularity = granularity;
+    bm->bitmap_size = size;
+
+    bm->l1_size =
+        size_to_clusters(s, (((size - 1) / sector_granularity) >> 3) + 1);
+    l1_table_offset =
+        qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
+    if (l1_table_offset < 0) {
+        ret = l1_table_offset;
+        goto fail;
+    }
+    bm->l1_table_offset = l1_table_offset;
+
+    l1_table = g_try_new(uint64_t, bm->l1_size);
+    if (l1_table == NULL) {
+        ret = -ENOMEM;
+        goto fail;
+    }
+
+    /* initialize with zero clusters */
+    for (i = 0; i < s->l1_size; i++) {
+        l1_table[i] = cpu_to_be64(1);
+    }
+
+    ret = qcow2_pre_write_overlap_check(bs, 0, bm->l1_table_offset,
+                                        s->l1_size * sizeof(uint64_t));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
+                      s->l1_size * sizeof(uint64_t));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    g_free(l1_table);
+    l1_table = NULL;
+
+    /* Append the new dirty bitmap to the dirty bitmap list */
+    new_dirty_bitmap_list = g_new(QCowDirtyBitmap, s->nb_dirty_bitmaps + 1);
+    if (s->dirty_bitmaps) {
+        memcpy(new_dirty_bitmap_list, s->dirty_bitmaps,
+               s->nb_dirty_bitmaps * sizeof(QCowDirtyBitmap));
+        old_dirty_bitmap_list = s->dirty_bitmaps;
+    }
+    s->dirty_bitmaps = new_dirty_bitmap_list;
+    s->dirty_bitmaps[s->nb_dirty_bitmaps++] = *bm;
+
+    ret = qcow2_write_dirty_bitmaps(bs);
+    if (ret < 0) {
+        g_free(s->dirty_bitmaps);
+        s->dirty_bitmaps = old_dirty_bitmap_list;
+        s->nb_dirty_bitmaps--;
+        goto fail;
+    }
+
+    g_free(old_dirty_bitmap_list);
+
+    return 0;
+
+fail:
+    g_free(bm->name);
+    g_free(l1_table);
+
+    return ret;
+}
+
+static int qcow2_dirty_bitmap_free_clusters(BlockDriverState *bs,
+                                            QCowDirtyBitmap *bm)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret, i;
+    uint64_t *l1_table = g_new(uint64_t, bm->l1_size);
+
+    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
+                     bm->l1_size * sizeof(uint64_t));
+    if (ret < 0) {
+        g_free(l1_table);
+        return ret;
+    }
+
+    for (i = 0; i < bm->l1_size; ++i) {
+        uint64_t addr = be64_to_cpu(l1_table[i]);
+        qcow2_free_clusters(bs, addr, s->cluster_size, QCOW2_DISCARD_ALWAYS);
+    }
+
+    qcow2_free_clusters(bs, bm->l1_table_offset, bm->l1_size * sizeof(uint64_t),
+                        QCOW2_DISCARD_ALWAYS);
+
+    g_free(l1_table);
+    return 0;
+}
+
+int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
+                              const char *name,
+                              Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    QCowDirtyBitmap bm;
+    int dirty_bitmap_index, ret = 0;
+
+    /* Search the dirty_bitmap */
+    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
+    if (dirty_bitmap_index < 0) {
+        error_setg(errp, "Can't find the dirty bitmap");
+        return -ENOENT;
+    }
+    bm = s->dirty_bitmaps[dirty_bitmap_index];
+
+    /* Remove it from the dirty_bitmap list */
+    memmove(s->dirty_bitmaps + dirty_bitmap_index,
+            s->dirty_bitmaps + dirty_bitmap_index + 1,
+            (s->nb_dirty_bitmaps - dirty_bitmap_index - 1) * sizeof(bm));
+    s->nb_dirty_bitmaps--;
+    ret = qcow2_write_dirty_bitmaps(bs);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to remove dirty bitmap"
+                         " from dirty bitmap list");
+        return ret;
+    }
+
+    qcow2_dirty_bitmap_free_clusters(bs, &bm);
+    g_free(bm.name);
+
+    return ret;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index b9a72e3..406e55d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -61,6 +61,7 @@ typedef struct {
 #define  QCOW2_EXT_MAGIC_END 0
 #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
 #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
+#define  QCOW2_EXT_MAGIC_DIRTY_BITMAPS 0x23852875
 
 static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
@@ -90,6 +91,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
     QCowExtension ext;
     uint64_t offset;
     int ret;
+    Qcow2DirtyBitmapHeaderExt dirty_bitmaps_ext;
 
 #ifdef DEBUG_EXT
     printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
@@ -160,6 +162,33 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
             }
             break;
 
+        case QCOW2_EXT_MAGIC_DIRTY_BITMAPS:
+            ret = bdrv_pread(bs->file, offset, &dirty_bitmaps_ext, ext.len);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "ERROR: dirty_bitmaps_ext: "
+                                 "Could not read ext header");
+                return ret;
+            }
+
+            be64_to_cpus(&dirty_bitmaps_ext.dirty_bitmaps_offset);
+            be32_to_cpus(&dirty_bitmaps_ext.nb_dirty_bitmaps);
+
+            s->dirty_bitmaps_offset = dirty_bitmaps_ext.dirty_bitmaps_offset;
+            s->nb_dirty_bitmaps = dirty_bitmaps_ext.nb_dirty_bitmaps;
+
+            ret = qcow2_read_dirty_bitmaps(bs);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Could not read dirty bitmaps");
+                return ret;
+            }
+
+#ifdef DEBUG_EXT
+            printf("Qcow2: Got dirty bitmaps extension:"
+                   " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
+                   s->dirty_bitmaps_offset, s->nb_dirty_bitmaps);
+#endif
+            break;
+
         default:
             /* unknown magic - save it in case we need to rewrite the header */
             {
@@ -1000,6 +1029,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     g_free(s->unknown_header_fields);
     cleanup_unknown_header_ext(bs);
     qcow2_free_snapshots(bs);
+    qcow2_free_dirty_bitmaps(bs);
     qcow2_refcount_close(bs);
     qemu_vfree(s->l1_table);
     /* else pre-write overlap checks in cache_destroy may crash */
@@ -1466,6 +1496,7 @@ static void qcow2_close(BlockDriverState *bs)
     qemu_vfree(s->cluster_data);
     qcow2_refcount_close(bs);
     qcow2_free_snapshots(bs);
+    qcow2_free_dirty_bitmaps(bs);
 }
 
 static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
@@ -1667,6 +1698,21 @@ int qcow2_update_header(BlockDriverState *bs)
     buf += ret;
     buflen -= ret;
 
+    if (s->nb_dirty_bitmaps > 0) {
+        Qcow2DirtyBitmapHeaderExt dirty_bitmaps_header = {
+            .nb_dirty_bitmaps = cpu_to_be32(s->nb_dirty_bitmaps),
+            .dirty_bitmaps_offset = cpu_to_be64(s->dirty_bitmaps_offset)
+        };
+        ret = header_ext_add(buf, QCOW2_EXT_MAGIC_DIRTY_BITMAPS,
+                             &dirty_bitmaps_header, sizeof(dirty_bitmaps_header),
+                             buflen);
+        if (ret < 0) {
+            goto fail;
+        }
+        buf += ret;
+        buflen -= ret;
+    }
+
     /* Keep unknown header extensions */
     QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
         ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
@@ -2176,6 +2222,12 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
         return -ENOTSUP;
     }
 
+    /* cannot proceed if image has dirty_bitmaps */
+    if (s->nb_dirty_bitmaps) {
+        error_report("Can't resize an image which has dirty bitmaps");
+        return -ENOTSUP;
+    }
+
     /* shrinking is currently not supported */
     if (offset < bs->total_sectors * 512) {
         error_report("qcow2 doesn't support shrinking images yet");
@@ -2952,6 +3004,10 @@ BlockDriver bdrv_qcow2 = {
     .bdrv_get_info          = qcow2_get_info,
     .bdrv_get_specific_info = qcow2_get_specific_info,
 
+    .bdrv_dirty_bitmap_load = qcow2_dirty_bitmap_load,
+    .bdrv_dirty_bitmap_store = qcow2_dirty_bitmap_store,
+    .bdrv_dirty_bitmap_delete = qcow2_dirty_bitmap_delete,
+
     .bdrv_save_vmstate    = qcow2_save_vmstate,
     .bdrv_load_vmstate    = qcow2_load_vmstate,
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 422b825..24beee0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -39,6 +39,7 @@
 
 #define QCOW_MAX_CRYPT_CLUSTERS 32
 #define QCOW_MAX_SNAPSHOTS 65536
+#define QCOW_MAX_DIRTY_BITMAPS 65536
 
 /* 8 MB refcount table is enough for 2 PB images at 64k cluster size
  * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
@@ -52,6 +53,8 @@
  * space for snapshot names and IDs */
 #define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
 
+#define QCOW_MAX_DIRTY_BITMAPS_SIZE (1024 * QCOW_MAX_DIRTY_BITMAPS)
+
 /* indicate that the refcount of the referenced cluster is exactly one. */
 #define QCOW_OFLAG_COPIED     (1ULL << 63)
 /* indicate that the cluster is compressed (they never have the copied flag) */
@@ -138,6 +141,19 @@ typedef struct QEMU_PACKED QCowSnapshotHeader {
     /* name follows  */
 } QCowSnapshotHeader;
 
+typedef struct QEMU_PACKED QCowDirtyBitmapHeader {
+    /* header is 8 byte aligned */
+    uint64_t l1_table_offset;
+
+    uint32_t l1_size;
+    uint32_t bitmap_granularity;
+
+    uint64_t bitmap_size;
+    uint16_t name_size;
+
+    /* name follows  */
+} QCowDirtyBitmapHeader;
+
 typedef struct QEMU_PACKED QCowSnapshotExtraData {
     uint64_t vm_state_size_large;
     uint64_t disk_size;
@@ -156,6 +172,14 @@ typedef struct QCowSnapshot {
     uint64_t vm_clock_nsec;
 } QCowSnapshot;
 
+typedef struct QCowDirtyBitmap {
+    uint64_t l1_table_offset;
+    uint32_t l1_size;
+    char *name;
+    int bitmap_granularity;
+    uint64_t bitmap_size;
+} QCowDirtyBitmap;
+
 struct Qcow2Cache;
 typedef struct Qcow2Cache Qcow2Cache;
 
@@ -218,6 +242,11 @@ typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
 typedef void Qcow2SetRefcountFunc(void *refcount_array,
                                   uint64_t index, uint64_t value);
 
+typedef struct Qcow2DirtyBitmapHeaderExt {
+    uint32_t nb_dirty_bitmaps;
+    uint64_t dirty_bitmaps_offset;
+} QEMU_PACKED Qcow2DirtyBitmapHeaderExt;
+
 typedef struct BDRVQcowState {
     int cluster_bits;
     int cluster_size;
@@ -259,6 +288,11 @@ typedef struct BDRVQcowState {
     unsigned int nb_snapshots;
     QCowSnapshot *snapshots;
 
+    uint64_t dirty_bitmaps_offset;
+    int dirty_bitmaps_size;
+    unsigned int nb_dirty_bitmaps;
+    QCowDirtyBitmap *dirty_bitmaps;
+
     int flags;
     int qcow_version;
     bool use_lazy_refcounts;
@@ -570,6 +604,22 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
 void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs);
 
+/* qcow2-dirty-bitmap.c functions */
+int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
+                             const char *name, uint64_t size,
+                             int granularity);
+uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
+                                 const char *name, uint64_t size,
+                                 int granularity);
+int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
+                              uint64_t size, int granularity);
+int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
+                              const char *name,
+                              Error **errp);
+
+void qcow2_free_dirty_bitmaps(BlockDriverState *bs);
+int qcow2_read_dirty_bitmaps(BlockDriverState *bs);
+
 /* qcow2-cache.c functions */
 Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
 int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index db29b74..88855b4 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -206,6 +206,16 @@ struct BlockDriver {
     int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
     ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
 
+    int (*bdrv_dirty_bitmap_store)(BlockDriverState *bs, uint8_t *buf,
+                                   const char *name, uint64_t size,
+                                   int granularity);
+    uint8_t *(*bdrv_dirty_bitmap_load)(BlockDriverState *bs,
+                                       const char *name, uint64_t size,
+                                       int granularity);
+    int (*bdrv_dirty_bitmap_delete)(BlockDriverState *bs,
+                                    const char *name,
+                                    Error **errp);
+
     int (*bdrv_save_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
                              int64_t pos);
     int (*bdrv_load_vmstate)(BlockDriverState *bs, uint8_t *buf,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH 3/8] block: store persistent dirty bitmaps
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature Vladimir Sementsov-Ogievskiy
@ 2015-06-08 15:21 ` Vladimir Sementsov-Ogievskiy
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 4/8] block: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, vsementsov, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

Persistent dirty bitmaps are the bitmaps, for which the new field
BdrvDirtyBitmap.file is not NULL. We save all persistent dirty bitmaps
owned by BlockDriverState in corresponding bdrv_close().
BdrvDirtyBitmap.file is a BlockDriverState, where we want to save the
bitmap. It may be set in bdrv_dirty_bitmap_set_file() only once.
bdrv_ref/bdrv_unref are used for BdrvDirtyBitmap.file to be sure that
files will be closed and resources will be freed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c               | 45 +++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h |  4 ++++
 2 files changed, 49 insertions(+)

diff --git a/block.c b/block.c
index 575584d..74d4edc 100644
--- a/block.c
+++ b/block.c
@@ -70,6 +70,8 @@ struct BdrvDirtyBitmap {
     BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
     char *name;                 /* Optional non-empty unique ID */
     int64_t size;               /* Size of the bitmap (Number of sectors) */
+    BlockDriverState *file;     /* File where bitmap is loaded from (and should
+                                   be saved to) */
     bool disabled;              /* Bitmap is read-only */
     QLIST_ENTRY(BdrvDirtyBitmap) list;
 };
@@ -1710,6 +1712,7 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
 void bdrv_close(BlockDriverState *bs)
 {
     BdrvAioNotifier *ban, *ban_next;
+    BdrvDirtyBitmap *bm, *bm_next;
 
     if (bs->job) {
         block_job_cancel_sync(bs->job);
@@ -1719,6 +1722,15 @@ void bdrv_close(BlockDriverState *bs)
     bdrv_drain_all(); /* in case flush left pending I/O */
     notifier_list_notify(&bs->close_notifiers, bs);
 
+    /* save and release persistent dirty bitmaps */
+    QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, bm_next) {
+        if (bm->file) {
+            bdrv_store_dirty_bitmap(bm);
+            bdrv_unref(bm->file);
+            bdrv_release_dirty_bitmap(bs, bm);
+        }
+    }
+
     if (bs->drv) {
         if (bs->backing_hd) {
             BlockDriverState *backing_hd = bs->backing_hd;
@@ -3097,6 +3109,30 @@ void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap)
     }
 }
 
+int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap)
+{
+    BlockDriverState *bs = bitmap->file;
+    uint8_t *buf;
+    uint64_t size;
+    assert(bs);
+    assert(bs->drv);
+    assert(bs->drv->bdrv_dirty_bitmap_store);
+
+    size = hbitmap_data_size(bitmap->bitmap, bitmap->size);
+    size = (size + 3) & ~3;
+    buf = g_malloc(size);
+
+    hbitmap_serialize_part(bitmap->bitmap, buf, 0, bitmap->size);
+
+    int res = bs->drv->bdrv_dirty_bitmap_store(bs, buf,
+                                               bitmap->name,
+                                               bitmap->size,
+                                               bdrv_dirty_bitmap_granularity(bitmap));
+
+    g_free(buf);
+    return res;
+}
+
 BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
                                           uint32_t granularity,
                                           const char *name,
@@ -3257,6 +3293,15 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
     }
 }
 
+void bdrv_dirty_bitmap_set_file(BdrvDirtyBitmap *bitmap, BlockDriverState *file)
+{
+    assert(bitmap->file == NULL);
+    bitmap->file = file;
+    if (file != NULL) {
+        bdrv_ref(file);
+    }
+}
+
 void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 {
     assert(!bdrv_dirty_bitmap_frozen(bitmap));
diff --git a/include/block/block.h b/include/block/block.h
index 593c29e..6e82597 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -468,6 +468,8 @@ BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs,
                                         const char *name);
 void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap);
 void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
+void bdrv_dirty_bitmap_set_file(BdrvDirtyBitmap *bitmap,
+                                BlockDriverState *file);
 void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs);
@@ -506,6 +508,8 @@ HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
                                  uint64_t granularity);
 void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap);
 
+int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap);
+
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
 void bdrv_disable_copy_on_read(BlockDriverState *bs);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH 4/8] block: add bdrv_load_dirty_bitmap
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 3/8] block: store persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
@ 2015-06-08 15:21 ` Vladimir Sementsov-Ogievskiy
  2015-06-09 16:01   ` Stefan Hajnoczi
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 5/8] qcow2: add qcow2_dirty_bitmap_delete_all Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, vsementsov, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

The funcion loads dirty bitmap from file, using underlying driver
function.

Note: the function doesn't change BdrvDirtyBitmap.file field. This field
is only used by bdrv_store_dirty_bitmap() function and is ONLY written
by bdrv_dirty_bitmap_set_file() function.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c               | 37 +++++++++++++++++++++++++++++++++++++
 include/block/block.h |  5 +++++
 2 files changed, 42 insertions(+)

diff --git a/block.c b/block.c
index 74d4edc..6230717 100644
--- a/block.c
+++ b/block.c
@@ -3109,6 +3109,43 @@ void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap)
     }
 }
 
+BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs,
+                                        BlockDriverState *file,
+                                        int granularity,
+                                        const char *name,
+                                        Error **errp)
+{
+    BlockDriver *drv = file->drv;
+    if (!drv) {
+        return NULL;
+    }
+    if (drv->bdrv_dirty_bitmap_load) {
+        BdrvDirtyBitmap *bitmap;
+        uint64_t bitmap_size = bdrv_nb_sectors(bs);
+        uint8_t *buf = drv->bdrv_dirty_bitmap_load(file, name, bitmap_size,
+                                                   granularity);
+        if (buf == NULL) {
+            return NULL;
+        }
+
+        bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
+        if (bitmap == NULL) {
+            g_free(buf);
+            return NULL;
+        }
+
+        hbitmap_deserialize_part(bitmap->bitmap, buf, 0, bitmap_size);
+        hbitmap_deserialize_finish(bitmap->bitmap);
+
+        return bitmap;
+    }
+    if (file->file)  {
+        return bdrv_load_dirty_bitmap(bs, file->file, granularity, name,
+                                      errp);
+    }
+    return NULL;
+}
+
 int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 {
     BlockDriverState *bs = bitmap->file;
diff --git a/include/block/block.h b/include/block/block.h
index 6e82597..fcdb0f3 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -508,6 +508,11 @@ HBitmap *bdrv_create_meta_bitmap(BdrvDirtyBitmap *bitmap,
                                  uint64_t granularity);
 void bdrv_release_meta_bitmap(BdrvDirtyBitmap *bitmap);
 
+BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs,
+                                        BlockDriverState *file,
+                                        int granularity,
+                                        const char *name,
+                                        Error **errp);
 int bdrv_store_dirty_bitmap(BdrvDirtyBitmap *bitmap);
 
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH 5/8] qcow2: add qcow2_dirty_bitmap_delete_all
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 4/8] block: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
@ 2015-06-08 15:21 ` Vladimir Sementsov-Ogievskiy
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, vsementsov, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-dirty-bitmap.c | 29 +++++++++++++++++++++++++++++
 block/qcow2.h              |  2 ++
 2 files changed, 31 insertions(+)

diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
index bc0167c..db83112 100644
--- a/block/qcow2-dirty-bitmap.c
+++ b/block/qcow2-dirty-bitmap.c
@@ -501,3 +501,32 @@ int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
 
     return ret;
 }
+
+int qcow2_delete_all_dirty_bitmaps(BlockDriverState *bs, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret = 0, i;
+
+    for (i = 0; i < s->nb_dirty_bitmaps; ++i) {
+        ret = qcow2_dirty_bitmap_free_clusters(bs, s->dirty_bitmaps + i);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret,
+                             "Failed to free dirty bitmap clusters");
+            return ret;
+        }
+        g_free(s->dirty_bitmaps[i].name);
+    }
+
+    g_free(s->dirty_bitmaps);
+    s->nb_dirty_bitmaps = 0;
+
+    ret = qcow2_write_dirty_bitmaps(bs);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret,
+                         "Failed to remove dirty bitmaps"
+                         " from dirty bitmap list");
+        return ret;
+    }
+
+    return ret;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 24beee0..b5e576c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -616,6 +616,8 @@ int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
 int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
                               const char *name,
                               Error **errp);
+int qcow2_delete_all_dirty_bitmaps(BlockDriverState *bs,
+                                   Error **errp);
 
 void qcow2_free_dirty_bitmaps(BlockDriverState *bs);
 int qcow2_read_dirty_bitmaps(BlockDriverState *bs);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 5/8] qcow2: add qcow2_dirty_bitmap_delete_all Vladimir Sementsov-Ogievskiy
@ 2015-06-08 15:21 ` Vladimir Sementsov-Ogievskiy
  2015-06-09 15:49   ` Stefan Hajnoczi
                     ` (2 more replies)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 7/8] qemu: command line option " Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  10 siblings, 3 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, vsementsov, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-dirty-bitmap.c |  5 +++++
 block/qcow2.c              | 13 +++++++++++--
 block/qcow2.h              |  9 +++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
index db83112..686a121 100644
--- a/block/qcow2-dirty-bitmap.c
+++ b/block/qcow2-dirty-bitmap.c
@@ -188,6 +188,11 @@ static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
 
     s->dirty_bitmaps_offset = dirty_bitmaps_offset;
     s->dirty_bitmaps_size = dirty_bitmaps_size;
+    if (s->nb_dirty_bitmaps > 0) {
+        s->autoclear_features |= QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
+    } else {
+        s->autoclear_features &= ~QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
+    }
     ret = qcow2_update_header(bs);
     if (ret < 0) {
         fprintf(stderr, "Could not update qcow2 header\n");
diff --git a/block/qcow2.c b/block/qcow2.c
index 406e55d..f85a55a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -182,6 +182,14 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
                 return ret;
             }
 
+            if (!(s->autoclear_features & QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
+                s->nb_dirty_bitmaps > 0) {
+                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
+                if (ret < 0) {
+                    return ret;
+                }
+            }
+
 #ifdef DEBUG_EXT
             printf("Qcow2: Got dirty bitmaps extension:"
                    " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
@@ -928,8 +936,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* Clear unknown autoclear feature bits */
-    if (!bs->read_only && !(flags & BDRV_O_INCOMING) && s->autoclear_features) {
-        s->autoclear_features = 0;
+    if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
+        (s->autoclear_features & ~QCOW2_AUTOCLEAR_MASK)) {
+        s->autoclear_features |= QCOW2_AUTOCLEAR_MASK;
         ret = qcow2_update_header(bs);
         if (ret < 0) {
             error_setg_errno(errp, -ret, "Could not update qcow2 header");
diff --git a/block/qcow2.h b/block/qcow2.h
index b5e576c..14bd6f9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -215,6 +215,15 @@ enum {
     QCOW2_COMPAT_FEAT_MASK            = QCOW2_COMPAT_LAZY_REFCOUNTS,
 };
 
+/* Autoclear feature bits */
+enum {
+    QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR = 0,
+    QCOW2_AUTOCLEAR_DIRTY_BITMAPS       =
+        1 << QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR,
+
+    QCOW2_AUTOCLEAR_MASK                = QCOW2_AUTOCLEAR_DIRTY_BITMAPS,
+};
+
 enum qcow2_discard_type {
     QCOW2_DISCARD_NEVER = 0,
     QCOW2_DISCARD_ALWAYS,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH 7/8] qemu: command line option for dirty bitmaps
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps Vladimir Sementsov-Ogievskiy
@ 2015-06-08 15:21 ` Vladimir Sementsov-Ogievskiy
  2015-06-11 20:57   ` John Snow
  2015-06-12 21:49   ` John Snow
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 8/8] iotests: test internal persistent dirty bitmap Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, vsementsov, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

The patch adds the following command line option:

-dirty-bitmap [option1=val1][,option2=val2]...
    Available options are:
    name         The name for the bitmap (necessary).

    file         The file to load the bitmap from.

    file_id      When specified with 'file' option, then this file will
                 be available through this id for other -dirty-bitmap
                 options when specified without 'file' option, then it
                 is a reference to 'file', specified with another
                 -dirty-bitmap option, and it will be used to load the
                 bitmap from.

    drive        The drive to bind the bitmap to. It should be specified
                 as 'id' suboption of one of -drive options. If nor
                 'file' neither 'file_id' are specified, then the bitmap
                 will be loaded from that drive (internal dirty bitmap).

    granularity  The granularity for the bitmap. Not necessary, the
                 default value may be used.

    enabled      on|off. Default is 'on'. Disabled bitmaps are not
                 changing regardless of writes to corresponding drive.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 blockdev.c                |  38 ++++++++++++++++++
 include/sysemu/blockdev.h |   1 +
 include/sysemu/sysemu.h   |   1 +
 qemu-options.hx           |  37 +++++++++++++++++
 vl.c                      | 100 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 5eaf77e..2a74395 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -176,6 +176,11 @@ QemuOpts *drive_def(const char *optstr)
     return qemu_opts_parse(qemu_find_opts("drive"), optstr, 0);
 }
 
+QemuOpts *dirty_bitmap_def(const char *optstr)
+{
+    return qemu_opts_parse(qemu_find_opts("dirty-bitmap"), optstr, 0);
+}
+
 QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
                     const char *optstr)
 {
@@ -3093,6 +3098,39 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
     return head;
 }
 
+QemuOptsList qemu_dirty_bitmap_opts = {
+    .name = "dirty-bitmap",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_dirty_bitmap_opts.head),
+    .desc = {
+        {
+            .name = "name",
+            .type = QEMU_OPT_STRING,
+            .help = "Name of the dirty bitmap",
+        },{
+            .name = "file",
+            .type = QEMU_OPT_STRING,
+            .help = "file name to load the bitmap from",
+        },{
+            .name = "file_id",
+            .type = QEMU_OPT_STRING,
+            .help = "node name to load the bitmap from (or to set id for"
+                    " for file, opened by previous option)",
+        },{
+            .name = "drive",
+            .type = QEMU_OPT_STRING,
+            .help = "drive id to bind the bitmap to",
+        },{
+            .name = "granularity",
+            .type = QEMU_OPT_NUMBER,
+            .help = "granularity",
+        },{
+            .name = "enabled",
+            .type = QEMU_OPT_BOOL,
+            .help = "enabled flag (default is 'on')",
+        }
+    }
+};
+
 QemuOptsList qemu_common_drive_opts = {
     .name = "drive",
     .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 7ca59b5..5b101b8 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -57,6 +57,7 @@ int drive_get_max_devs(BlockInterfaceType type);
 DriveInfo *drive_get_next(BlockInterfaceType type);
 
 QemuOpts *drive_def(const char *optstr);
+QemuOpts *dirty_bitmap_def(const char *optstr);
 QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
                     const char *optstr);
 DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8a52934..681a8f3 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -207,6 +207,7 @@ bool usb_enabled(void);
 
 extern QemuOptsList qemu_legacy_drive_opts;
 extern QemuOptsList qemu_common_drive_opts;
+extern QemuOptsList qemu_dirty_bitmap_opts;
 extern QemuOptsList qemu_drive_opts;
 extern QemuOptsList qemu_chardev_opts;
 extern QemuOptsList qemu_device_opts;
diff --git a/qemu-options.hx b/qemu-options.hx
index ec356f6..5e93122 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -614,6 +614,43 @@ qemu-system-i386 -hda a -hdb b
 @end example
 ETEXI
 
+DEF("dirty-bitmap", HAS_ARG, QEMU_OPTION_dirty_bitmap,
+    "-dirty-bitmap name=name[,file=file][,file_id=file_id][,drive=@var{id}]\n"
+    "              [,granularity=granularity][,enabled=on|off]\n",
+    QEMU_ARCH_ALL)
+STEXI
+@item -dirty-bitmap @var{option}[,@var{option}[,@var{option}[,...]]]
+@findex -dirty-bitmap
+
+Define a dirty-bitmap. Valid options are:
+
+@table @option
+@item name=@var{name}
+The name of the bitmap. Should be unique per @var{file}/@var{drive} and per
+@var{for_drive}.
+@item file=@var{file}
+The separate qcow2 file for loading the bitmap @var{name} from it.
+@item file_id=@var{file_id}
+When specified with @var{file} option, then this @var{file} will be available
+through this @var{file_id} for other @option{-dirty-bitmap} options.
+When specified without @var{file} option, then it is a reference to @var{file},
+specified with another @option{-dirty-bitmap} option, and it will be used to
+load the bitmap from.
+@item drive=@var{drive}
+The drive to bind the bitmap to. It should be specified as @var{id} suboption
+of one of @option{-drive} options.
+If nor @var{file} neither @var{file_id} are specified, then the bitmap will be
+loaded from that drive (internal dirty bitmap).
+@item granularity=@var{granularity}
+Granularity (in bytes) for created dirty bitmap. If the bitmap is already
+exists in specified @var{file}/@var{file_id}/@var{device} it's granularity will
+not be changed but only checked (an error will be generated if this check
+fails).
+@item enabled=@var{enabled}
+Enabled flag for the bitmap. By default the bitmap will be enabled.
+@end table
+ETEXI
+
 DEF("mtdblock", HAS_ARG, QEMU_OPTION_mtdblock,
     "-mtdblock file  use 'file' as on-board Flash memory image\n",
     QEMU_ARCH_ALL)
diff --git a/vl.c b/vl.c
index 83871f5..fb16d0c 100644
--- a/vl.c
+++ b/vl.c
@@ -1091,6 +1091,95 @@ static int cleanup_add_fd(QemuOpts *opts, void *opaque)
 #define MTD_OPTS ""
 #define SD_OPTS ""
 
+static int dirty_bitmap_func(QemuOpts *opts, void *opaque)
+{
+    Error *local_err = NULL;
+    Error **errp = &local_err;
+    BlockDriverState *file_bs = NULL, *for_bs = NULL;
+    BdrvDirtyBitmap *bitmap = NULL;
+
+    const char *name = qemu_opt_get(opts, "name");
+    const char *drive = qemu_opt_get(opts, "drive");
+    const char *file = qemu_opt_get(opts, "file");
+    const char *file_id = qemu_opt_get(opts, "file_id");
+
+    uint64_t granularity = qemu_opt_get_number(opts, "granularity", 0);
+    bool enabled = qemu_opt_get_bool(opts, "enabled", true);
+
+    if (name == NULL) {
+        error_setg(errp, "'name' option is necessary");
+        goto fail;
+    }
+
+    if (drive == NULL) {
+        error_setg(errp, "'drive' option is necessary");
+        goto fail;
+    }
+
+    for_bs = bdrv_lookup_bs(drive, NULL, errp);
+    if (for_bs == NULL) {
+        goto fail;
+    }
+
+    if (file != NULL) {
+        QDict *options = NULL;
+        if (file_id != NULL) {
+            options = qdict_new();
+            qdict_put(options, "node-name", qstring_from_str(file_id));
+        }
+
+        bdrv_open(&file_bs, file, NULL, options, 0, NULL, errp);
+        if (options) {
+            QDECREF(options);
+        }
+        if (file_bs == NULL) {
+            goto fail;
+        }
+    } else if (file_id != NULL) {
+        file_bs = bdrv_find_node(file_id);
+        if (file_bs == NULL) {
+            error_setg(errp, "node '%s' is not found", drive);
+            goto fail;
+        }
+    } else {
+        file_bs = for_bs;
+    }
+
+    if (granularity == 0) {
+        granularity = bdrv_get_default_bitmap_granularity(for_bs);
+    }
+
+    bitmap = bdrv_load_dirty_bitmap(for_bs, file_bs, granularity, name,
+                                    errp);
+    if (*errp != NULL) {
+        goto fail;
+    }
+
+    if (bitmap == NULL) {
+        /* bitmap is not found in file_bs */
+        bitmap = bdrv_create_dirty_bitmap(for_bs, granularity, name, errp);
+        if (!bitmap) {
+            goto fail;
+        }
+    }
+
+    bdrv_dirty_bitmap_set_file(bitmap, file_bs);
+
+    if (!enabled) {
+        bdrv_disable_dirty_bitmap(bitmap);
+    }
+
+    return 0;
+
+fail:
+    error_report("-dirty-bitmap: %s", error_get_pretty(local_err));
+    error_free(local_err);
+    if (file_bs != NULL) {
+        bdrv_close(file_bs);
+    }
+    return -1;
+}
+
 static int drive_init_func(QemuOpts *opts, void *opaque)
 {
     BlockInterfaceType *block_default_type = opaque;
@@ -2790,6 +2879,7 @@ int main(int argc, char **argv, char **envp)
     module_call_init(MODULE_INIT_QOM);
 
     qemu_add_opts(&qemu_drive_opts);
+    qemu_add_opts(&qemu_dirty_bitmap_opts);
     qemu_add_drive_opts(&qemu_legacy_drive_opts);
     qemu_add_drive_opts(&qemu_common_drive_opts);
     qemu_add_drive_opts(&qemu_drive_opts);
@@ -2918,6 +3008,11 @@ int main(int argc, char **argv, char **envp)
                     exit(1);
                 }
                 break;
+            case QEMU_OPTION_dirty_bitmap:
+                if (dirty_bitmap_def(optarg) == NULL) {
+                    exit(1);
+                }
+                break;
             case QEMU_OPTION_set:
                 if (qemu_set_option(optarg) != 0)
                     exit(1);
@@ -4198,6 +4293,11 @@ int main(int argc, char **argv, char **envp)
 
     parse_numa_opts(machine_class);
 
+    if (qemu_opts_foreach(qemu_find_opts("dirty-bitmap"), dirty_bitmap_func,
+                          NULL, 1) != 0) {
+        exit(1);
+    }
+
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
         exit(1);
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [Qemu-devel] [PATCH 8/8] iotests: test internal persistent dirty bitmap
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 7/8] qemu: command line option " Vladimir Sementsov-Ogievskiy
@ 2015-06-08 15:21 ` Vladimir Sementsov-Ogievskiy
  2015-06-09 16:17   ` Eric Blake
  2015-06-10 15:27 ` [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Stefan Hajnoczi
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-08 15:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, vsementsov, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>

The test performs several vm reloads with checking and updating dirty
bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/118        | 83 +++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/118.out    |  5 +++
 tests/qemu-iotests/group      |  1 +
 tests/qemu-iotests/iotests.py |  6 ++++
 4 files changed, 95 insertions(+)
 create mode 100755 tests/qemu-iotests/118
 create mode 100644 tests/qemu-iotests/118.out

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
new file mode 100755
index 0000000..f6e91aa
--- /dev/null
+++ b/tests/qemu-iotests/118
@@ -0,0 +1,83 @@
+#!/usr/bin/env python
+#
+# Tests for persistent dirty bitmaps.
+#
+# (C) Vladimir Sementsov-Ogievskiy 2015
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+import time
+from iotests import qemu_img
+
+disk = os.path.join(iotests.test_dir, 'disk')
+
+size   = 0x40000000 # 1G
+sector_size = 512
+granularity = 0x10000
+regions1 = [
+    { 'start': 0,          'count': 0x100000 },
+    { 'start': 0x200000,   'count': 0x100000 }
+    ]
+regions2 = [
+    { 'start': 0x10000000, 'count': 0x20000  },
+    { 'start': 0x39990000, 'count': 0x10000  }
+    ]
+
+class TestDirtyBitmapMigration(iotests.QMPTestCase):
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, disk, str(size))
+        self.vm = iotests.VM().add_drive(disk)
+        self.vm.add_dirty_bitmap('bitmap', 'drive0')
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(disk)
+
+    def getMd5(self):
+        result = self.vm.qmp('query-block');
+        return result['return'][0]['dirty-bitmaps'][0]['md5']
+
+    def checkBitmap(self, md5):
+        result = self.vm.qmp('query-block');
+        self.assert_qmp(result, 'return[0]/dirty-bitmaps[0]/md5', md5);
+
+    def writeRegions(self, regions):
+        for r in regions:
+          self.vm.hmp_qemu_io('drive0',
+                                'write %d %d' % (r['start'], r['count']))
+
+    def test_persistent(self):
+        self.writeRegions(regions1)
+        md5 = self.getMd5()
+
+        self.vm.shutdown()
+        self.vm.launch()
+
+        self.checkBitmap(md5)
+        self.writeRegions(regions2)
+        md5 = self.getMd5()
+
+        self.vm.shutdown()
+        self.vm.launch()
+
+        self.checkBitmap(md5)
+
+
+if __name__ == '__main__':
+    iotests.main()
diff --git a/tests/qemu-iotests/118.out b/tests/qemu-iotests/118.out
new file mode 100644
index 0000000..ae1213e
--- /dev/null
+++ b/tests/qemu-iotests/118.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 6812681..8cd4db1 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -122,6 +122,7 @@
 115 rw auto
 116 rw auto quick
 117 rw auto quick
+118 rw auto quick
 121 rw auto
 122 rw auto
 123 rw auto quick
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 945f2a2..013d5da 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -140,6 +140,12 @@ class VM(object):
         self._args.append(desc)
         return self
 
+    def add_dirty_bitmap(self, name, drive):
+        '''Add dirty bitmap parameter to VM cmd'''
+        self._args.append('-dirty-bitmap')
+        self._args.append('name=%s,drive=%s' % (name, drive))
+        return self
+
     def pause_drive(self, drive, event=None):
         '''Pause drive r/w operations'''
         if not event:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps Vladimir Sementsov-Ogievskiy
@ 2015-06-09 15:49   ` Stefan Hajnoczi
  2015-06-09 15:50   ` Stefan Hajnoczi
  2015-06-10 23:42   ` John Snow
  2 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-09 15:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 901 bytes --]

On Mon, Jun 08, 2015 at 06:21:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> @@ -928,8 +936,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>      }
>  
>      /* Clear unknown autoclear feature bits */
> -    if (!bs->read_only && !(flags & BDRV_O_INCOMING) && s->autoclear_features) {
> -        s->autoclear_features = 0;
> +    if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
> +        (s->autoclear_features & ~QCOW2_AUTOCLEAR_MASK)) {
> +        s->autoclear_features |= QCOW2_AUTOCLEAR_MASK;

This should be bitwise-and instead of bitwise-or:

s->autoclear_features &= QCOW2_AUTOCLEAR_MASK

Otherwise we set features that happen to be in QCOW2_AUTOCLEAR_MASK but
were not enabled by the user.  Right now that's not fatal but if other
features are added to QCOW2_AUTOCLEAR_MASK it could introduce a bug,
depending on the feature semantics.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps Vladimir Sementsov-Ogievskiy
  2015-06-09 15:49   ` Stefan Hajnoczi
@ 2015-06-09 15:50   ` Stefan Hajnoczi
  2015-08-27  7:45     ` Vladimir Sementsov-Ogievskiy
  2015-06-10 23:42   ` John Snow
  2 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-09 15:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 766 bytes --]

On Mon, Jun 08, 2015 at 06:21:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 406e55d..f85a55a 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -182,6 +182,14 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>                  return ret;
>              }
>  
> +            if (!(s->autoclear_features & QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
> +                s->nb_dirty_bitmaps > 0) {
> +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
> +                if (ret < 0) {
> +                    return ret;
> +                }
> +            }
> +

What if the file is read-only?

We shouldn't modify the file in qcow2_read_extensions().

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 4/8] block: add bdrv_load_dirty_bitmap
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 4/8] block: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
@ 2015-06-09 16:01   ` Stefan Hajnoczi
  2015-06-10 22:33     ` John Snow
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-09 16:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

On Mon, Jun 08, 2015 at 06:21:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs,
> +                                        BlockDriverState *file,
> +                                        int granularity,
> +                                        const char *name,
> +                                        Error **errp)
> +{
> +    BlockDriver *drv = file->drv;
> +    if (!drv) {
> +        return NULL;
> +    }
> +    if (drv->bdrv_dirty_bitmap_load) {
> +        BdrvDirtyBitmap *bitmap;
> +        uint64_t bitmap_size = bdrv_nb_sectors(bs);
> +        uint8_t *buf = drv->bdrv_dirty_bitmap_load(file, name, bitmap_size,
> +                                                   granularity);
> +        if (buf == NULL) {
> +            return NULL;
> +        }
> +
> +        bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
> +        if (bitmap == NULL) {
> +            g_free(buf);
> +            return NULL;
> +        }
> +
> +        hbitmap_deserialize_part(bitmap->bitmap, buf, 0, bitmap_size);
> +        hbitmap_deserialize_finish(bitmap->bitmap);

How about passing bitmap and errp into drv->bdrv_dirty_bitmap_load?
That way bdrv_dirty_bitmap_load() can stream using
hbitmap_deserialize_part() and does not need to allocate the full
bitmap.  It can also report errors properly.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
@ 2015-06-09 16:01   ` John Snow
  2015-06-09 17:03   ` Stefan Hajnoczi
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 76+ messages in thread
From: John Snow @ 2015-06-09 16:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
> other drives (there may be qcow2 file with zero disk size but with
> several dirty bitmaps for other drives).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 121dfc8..0fffba2 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
>                          0x00000000 - End of the header extension area
>                          0xE2792ACA - Backing file format name
>                          0x6803f857 - Feature name table
> +                        0x23852875 - Dirty bitmaps
>                          other      - Unknown header extension, can be safely
>                                       ignored
>  
> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>                      terminated if it has full length)
>  
>  
> +== Dirty bitmaps ==
> +
> +Dirty bitmaps is an optional header extension. It provides a possibility of
> +storing dirty bitmaps in qcow2 image. The fields are:
> +

I would say "It provides the ability to store dirty bitmaps in a qcow2
image." We have more than the possibility now :)

> +          0 -  3:  nb_dirty_bitmaps
> +                   Number of dirty bitmaps contained in the image
> +

Most fields seem to be documented as complete sentences, so maybe "The
number of dirty bitmaps contained in the image." and add the period.

> +          4 - 11:  dirty_bitmaps_offset
> +                   Offset into the image file at which the dirty bitmaps table
> +                   starts. Must be aligned to a cluster boundary.
> +
> +
>  == Host cluster management ==
>  
>  qcow2 manages the allocation of host clusters by maintaining a reference count
> @@ -360,3 +374,55 @@ Snapshot table entry:
>  
>          variable:   Padding to round up the snapshot table entry size to the
>                      next multiple of 8.
> +
> +
> +== Dirty bitmaps ==
> +
> +The feature supports storing several dirty bitmaps in the qcow2 file.

You could drop "several" from this sentence, we support an arbitrary number.

> +
> +=== Cluster mapping ===
> +
> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
> +bitmaps to host clusters. There is only an L1 table.
> +
> +The L1 table has a variable size (stored in the Bitmap table entry) and may
> +use multiple clusters, however it must be contiguous in the image file.
> +
> +Given an offset into the bitmap, the offset into the image file can be
> +obtained as follows:
> +
> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
> +
> +L1 table entry:
> +
> +    Bit  0 -  61:   Standard cluster descriptor
> +
> +        62 -  63:   Reserved
> +
> +=== Bitmap table ===
> +
> +A directory of all bitmaps is stored in the bitmap table, a contiguous area in
> +the image file, whose starting offset and length are given by the header fields
> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
> +variable length, depending on the length of name and extra data.
> +
> +Bitmap table entry:
> +
> +    Byte 0 -  7:    Offset into the image file at which the L1 table for the
> +                    bitmap starts. Must be aligned to a cluster boundary.
> +
> +         8 - 11:    Number of entries in the L1 table of the bitmap
> +
> +        12 - 15:    Bitmap granularity in bytes
> +
> +        16 - 23:    Bitmap size in sectors
> +
> +        24 - 25:    Size of the bitmap name
> +

This is, believe it or not, the first place in code that I am aware of
that actually places a limit on how big a bitmap name can be (64K!) --
we should probably clamp this value to something even lower (1024 is
probably graciously sufficient) and enforce that in the various bitmap
add/create routines.

> +        variable:   The name of the bitmap (not null terminated)
> +
> +        variable:   Padding to round up the bitmap table entry size to the
> +                    next multiple of 8.
> +
> +The fields "size", "granularity" and "name" are corresponding with the fields
> +in struct BdrvDirtyBitmap.
> 

Not yet being intricately familiar with qcow2, this looks good to me.

--js

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] iotests: test internal persistent dirty bitmap
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 8/8] iotests: test internal persistent dirty bitmap Vladimir Sementsov-Ogievskiy
@ 2015-06-09 16:17   ` Eric Blake
  0 siblings, 0 replies; 76+ messages in thread
From: Eric Blake @ 2015-06-09 16:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

[-- Attachment #1: Type: text/plain, Size: 1247 bytes --]

On 06/08/2015 09:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> The test performs several vm reloads with checking and updating dirty
> bitmap.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/118        | 83 +++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/118.out    |  5 +++
>  tests/qemu-iotests/group      |  1 +
>  tests/qemu-iotests/iotests.py |  6 ++++
>  4 files changed, 95 insertions(+)
>  create mode 100755 tests/qemu-iotests/118
>  create mode 100644 tests/qemu-iotests/118.out
> 
> diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
> new file mode 100755
> index 0000000..f6e91aa
> --- /dev/null
> +++ b/tests/qemu-iotests/118
> @@ -0,0 +1,83 @@
> +#!/usr/bin/env python
> +#
> +# Tests for persistent dirty bitmaps.
> +#
> +# (C) Vladimir Sementsov-Ogievskiy 2015

Please also use the word "Copyright".  The notation "(C)" afterwards is
optional, but the word "Copyright" makes a difference in some legal
situations, or so I've been taught.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature Vladimir Sementsov-Ogievskiy
@ 2015-06-09 16:52   ` Stefan Hajnoczi
  2015-06-10 14:30   ` Stefan Hajnoczi
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-09 16:52 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 2680 bytes --]

On Mon, Jun 08, 2015 at 06:21:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:

I haven't fully reviewed this patch yet but here are initial comments.

> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmapHeader h;
> +    QCowDirtyBitmap *bm;
> +    int i, name_size;
> +    int64_t offset;
> +    int ret;
> +
> +    if (!s->nb_dirty_bitmaps) {
> +        s->dirty_bitmaps = NULL;
> +        s->dirty_bitmaps_size = 0;
> +        return 0;
> +    }
> +
> +    offset = s->dirty_bitmaps_offset;
> +    s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        /* Read statically sized part of the dirty_bitmap header */
> +        offset = align_offset(offset, 8);
> +        ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +
> +        offset += sizeof(h);
> +        bm = s->dirty_bitmaps + i;
> +        bm->l1_table_offset = be64_to_cpu(h.l1_table_offset);
> +        bm->l1_size = be32_to_cpu(h.l1_size);
> +        bm->bitmap_granularity = be32_to_cpu(h.bitmap_granularity);
> +        bm->bitmap_size = be64_to_cpu(h.bitmap_size);

Input validation is missing.  These could be junk values.  Min, max,
alignment, etc need to be checked.

> @@ -90,6 +91,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>      QCowExtension ext;
>      uint64_t offset;
>      int ret;
> +    Qcow2DirtyBitmapHeaderExt dirty_bitmaps_ext;
>  
>  #ifdef DEBUG_EXT
>      printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
> @@ -160,6 +162,33 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>              }
>              break;
>  
> +        case QCOW2_EXT_MAGIC_DIRTY_BITMAPS:
> +            ret = bdrv_pread(bs->file, offset, &dirty_bitmaps_ext, ext.len);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "ERROR: dirty_bitmaps_ext: "
> +                                 "Could not read ext header");
> +                return ret;
> +            }
> +
> +            be64_to_cpus(&dirty_bitmaps_ext.dirty_bitmaps_offset);
> +            be32_to_cpus(&dirty_bitmaps_ext.nb_dirty_bitmaps);
> +
> +            s->dirty_bitmaps_offset = dirty_bitmaps_ext.dirty_bitmaps_offset;
> +            s->nb_dirty_bitmaps = dirty_bitmaps_ext.nb_dirty_bitmaps;
> +
> +            ret = qcow2_read_dirty_bitmaps(bs);

Missing input validation.  We cannot trust dirty_bitmaps_offset or
nb_dirty_bitmaps.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
  2015-06-09 16:01   ` John Snow
@ 2015-06-09 17:03   ` Stefan Hajnoczi
  2015-06-10  8:19     ` Vladimir Sementsov-Ogievskiy
  2015-06-10 15:34   ` Kevin Wolf
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-09 17:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 5155 bytes --]

On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
> other drives (there may be qcow2 file with zero disk size but with
> several dirty bitmaps for other drives).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 121dfc8..0fffba2 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
>                          0x00000000 - End of the header extension area
>                          0xE2792ACA - Backing file format name
>                          0x6803f857 - Feature name table
> +                        0x23852875 - Dirty bitmaps
>                          other      - Unknown header extension, can be safely
>                                       ignored
>  
> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>                      terminated if it has full length)
>  
>  
> +== Dirty bitmaps ==
> +
> +Dirty bitmaps is an optional header extension. It provides a possibility of
> +storing dirty bitmaps in qcow2 image. The fields are:
> +
> +          0 -  3:  nb_dirty_bitmaps
> +                   Number of dirty bitmaps contained in the image

Is there a maximum?

> +
> +          4 - 11:  dirty_bitmaps_offset
> +                   Offset into the image file at which the dirty bitmaps table
> +                   starts. Must be aligned to a cluster boundary.

The autoclear feature bit is undocumented.

>  == Host cluster management ==
>  
>  qcow2 manages the allocation of host clusters by maintaining a reference count
> @@ -360,3 +374,55 @@ Snapshot table entry:
>  
>          variable:   Padding to round up the snapshot table entry size to the
>                      next multiple of 8.
> +
> +
> +== Dirty bitmaps ==
> +
> +The feature supports storing several dirty bitmaps in the qcow2 file.
> +
> +=== Cluster mapping ===
> +
> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
> +bitmaps to host clusters. There is only an L1 table.
> +
> +The L1 table has a variable size (stored in the Bitmap table entry) and may
> +use multiple clusters, however it must be contiguous in the image file.

The use of "L1 table" could be confusing.  The refcount metadata uses
"refcount table" and "refcount block" to describe a one-level table.

> +
> +Given an offset into the bitmap, the offset into the image file can be
> +obtained as follows:
> +
> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)

It might help to add granularity to this formula.

Instead of "offset", "bit_number" or "bitnr" might be clearer since
"offset" means something different in other parts of the document.

> +
> +L1 table entry:
> +
> +    Bit  0 -  61:   Standard cluster descriptor
> +
> +        62 -  63:   Reserved

Do you really want to use the standard cluster descriptor with it's zero
bit?

Since bitmaps don't honor backing files there doesn't seem much point in
using the zero bit, things are simpler if just bits 9-55 are contain the
host cluster offset and 0 means the cluster is unallocated.

By honoring the zero bit there are three states:
1. Zero bit set, read zeroes
2. Zero bit not set, host cluster offset != 0, bits valid
3. Zero bit not set, host cluster offset == 0, unallocated

State 1 is not useful.

> +=== Bitmap table ===
> +
> +A directory of all bitmaps is stored in the bitmap table, a contiguous area in
> +the image file, whose starting offset and length are given by the header fields
> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
> +variable length, depending on the length of name and extra data.
> +
> +Bitmap table entry:
> +
> +    Byte 0 -  7:    Offset into the image file at which the L1 table for the
> +                    bitmap starts. Must be aligned to a cluster boundary.
> +
> +         8 - 11:    Number of entries in the L1 table of the bitmap
> +
> +        12 - 15:    Bitmap granularity in bytes
> +
> +        16 - 23:    Bitmap size in sectors
> +
> +        24 - 25:    Size of the bitmap name
> +
> +        variable:   The name of the bitmap (not null terminated)
> +
> +        variable:   Padding to round up the bitmap table entry size to the
> +                    next multiple of 8.
> +
> +The fields "size", "granularity" and "name" are corresponding with the fields
> +in struct BdrvDirtyBitmap.

Referring to the internals of a C struct in QEMU is not appropriate for
a file format specification.  Please document the fields fully including
their constraints, minimums, maximums, etc.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-09 17:03   ` Stefan Hajnoczi
@ 2015-06-10  8:19     ` Vladimir Sementsov-Ogievskiy
  2015-06-10  8:49       ` Vladimir Sementsov-Ogievskiy
                         ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-10  8:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

On 09.06.2015 20:03, Stefan Hajnoczi wrote:
> On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>
>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>> other drives (there may be qcow2 file with zero disk size but with
>> several dirty bitmaps for other drives).
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>> index 121dfc8..0fffba2 100644
>> --- a/docs/specs/qcow2.txt
>> +++ b/docs/specs/qcow2.txt
>> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
>>                           0x00000000 - End of the header extension area
>>                           0xE2792ACA - Backing file format name
>>                           0x6803f857 - Feature name table
>> +                        0x23852875 - Dirty bitmaps
>>                           other      - Unknown header extension, can be safely
>>                                        ignored
>>   
>> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>>                       terminated if it has full length)
>>   
>>   
>> +== Dirty bitmaps ==
>> +
>> +Dirty bitmaps is an optional header extension. It provides a possibility of
>> +storing dirty bitmaps in qcow2 image. The fields are:
>> +
>> +          0 -  3:  nb_dirty_bitmaps
>> +                   Number of dirty bitmaps contained in the image
> Is there a maximum?
hmm. any proposals for this?
>
>> +
>> +          4 - 11:  dirty_bitmaps_offset
>> +                   Offset into the image file at which the dirty bitmaps table
>> +                   starts. Must be aligned to a cluster boundary.
> The autoclear feature bit is undocumented.
>
>>   == Host cluster management ==
>>   
>>   qcow2 manages the allocation of host clusters by maintaining a reference count
>> @@ -360,3 +374,55 @@ Snapshot table entry:
>>   
>>           variable:   Padding to round up the snapshot table entry size to the
>>                       next multiple of 8.
>> +
>> +
>> +== Dirty bitmaps ==
>> +
>> +The feature supports storing several dirty bitmaps in the qcow2 file.
>> +
>> +=== Cluster mapping ===
>> +
>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>> +bitmaps to host clusters. There is only an L1 table.
>> +
>> +The L1 table has a variable size (stored in the Bitmap table entry) and may
>> +use multiple clusters, however it must be contiguous in the image file.
> The use of "L1 table" could be confusing.  The refcount metadata uses
> "refcount table" and "refcount block" to describe a one-level table.
I agree. Hmm.. dirty bitmaps table? ok?
>
>> +
>> +Given an offset into the bitmap, the offset into the image file can be
>> +obtained as follows:
>> +
>> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
> It might help to add granularity to this formula.
>
> Instead of "offset", "bit_number" or "bitnr" might be clearer since
> "offset" means something different in other parts of the document.
Hmm. In my opinion, the bitmap here is stored as raw data. And 
granularity is an additional parameter (for deserializing this data). 
So, it is an offset in bytes for this data. The format is not for 
accessing bitmap bits, it's only for loading the whole bitmap one time.
>
>> +
>> +L1 table entry:
>> +
>> +    Bit  0 -  61:   Standard cluster descriptor
>> +
>> +        62 -  63:   Reserved
> Do you really want to use the standard cluster descriptor with it's zero
> bit?
>
> Since bitmaps don't honor backing files there doesn't seem much point in
> using the zero bit, things are simpler if just bits 9-55 are contain the
> host cluster offset and 0 means the cluster is unallocated.
>
> By honoring the zero bit there are three states:
> 1. Zero bit set, read zeroes
> 2. Zero bit not set, host cluster offset != 0, bits valid
> 3. Zero bit not set, host cluster offset == 0, unallocated
>
> State 1 is not useful.
>
>> +=== Bitmap table ===
>> +
>> +A directory of all bitmaps is stored in the bitmap table, a contiguous area in
>> +the image file, whose starting offset and length are given by the header fields
>> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
>> +variable length, depending on the length of name and extra data.
>> +
>> +Bitmap table entry:
>> +
>> +    Byte 0 -  7:    Offset into the image file at which the L1 table for the
>> +                    bitmap starts. Must be aligned to a cluster boundary.
>> +
>> +         8 - 11:    Number of entries in the L1 table of the bitmap
>> +
>> +        12 - 15:    Bitmap granularity in bytes
>> +
>> +        16 - 23:    Bitmap size in sectors
>> +
>> +        24 - 25:    Size of the bitmap name
>> +
>> +        variable:   The name of the bitmap (not null terminated)
>> +
>> +        variable:   Padding to round up the bitmap table entry size to the
>> +                    next multiple of 8.
>> +
>> +The fields "size", "granularity" and "name" are corresponding with the fields
>> +in struct BdrvDirtyBitmap.
> Referring to the internals of a C struct in QEMU is not appropriate for
> a file format specification.  Please document the fields fully including
> their constraints, minimums, maximums, etc.


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-10  8:19     ` Vladimir Sementsov-Ogievskiy
@ 2015-06-10  8:49       ` Vladimir Sementsov-Ogievskiy
  2015-06-10 13:00       ` Eric Blake
  2015-06-10 13:24       ` Stefan Hajnoczi
  2 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-10  8:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

On 10.06.2015 11:19, Vladimir Sementsov-Ogievskiy wrote:
> On 09.06.2015 20:03, Stefan Hajnoczi wrote:
>> On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir 
>> Sementsov-Ogievskiy wrote:
>>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>>
>>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>>> other drives (there may be qcow2 file with zero disk size but with
>>> several dirty bitmaps for other drives).
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   docs/specs/qcow2.txt | 66 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 66 insertions(+)
>>>
>>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>>> index 121dfc8..0fffba2 100644
>>> --- a/docs/specs/qcow2.txt
>>> +++ b/docs/specs/qcow2.txt
>>> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like 
>>> the following:
>>>                           0x00000000 - End of the header extension area
>>>                           0xE2792ACA - Backing file format name
>>>                           0x6803f857 - Feature name table
>>> +                        0x23852875 - Dirty bitmaps
>>>                           other      - Unknown header extension, can 
>>> be safely
>>>                                        ignored
>>>   @@ -166,6 +167,19 @@ the header extension data. Each entry look 
>>> like this:
>>>                       terminated if it has full length)
>>>     +== Dirty bitmaps ==
>>> +
>>> +Dirty bitmaps is an optional header extension. It provides a 
>>> possibility of
>>> +storing dirty bitmaps in qcow2 image. The fields are:
>>> +
>>> +          0 -  3:  nb_dirty_bitmaps
>>> +                   Number of dirty bitmaps contained in the image
>> Is there a maximum?
> hmm. any proposals for this?
>>
>>> +
>>> +          4 - 11:  dirty_bitmaps_offset
>>> +                   Offset into the image file at which the dirty 
>>> bitmaps table
>>> +                   starts. Must be aligned to a cluster boundary.
>> The autoclear feature bit is undocumented.
>>
>>>   == Host cluster management ==
>>>     qcow2 manages the allocation of host clusters by maintaining a 
>>> reference count
>>> @@ -360,3 +374,55 @@ Snapshot table entry:
>>>             variable:   Padding to round up the snapshot table entry 
>>> size to the
>>>                       next multiple of 8.
>>> +
>>> +
>>> +== Dirty bitmaps ==
>>> +
>>> +The feature supports storing several dirty bitmaps in the qcow2 file.
>>> +
>>> +=== Cluster mapping ===
>>> +
>>> +Dirty bitmaps are stored using a ONE-level structure for the 
>>> mapping of
>>> +bitmaps to host clusters. There is only an L1 table.
>>> +
>>> +The L1 table has a variable size (stored in the Bitmap table entry) 
>>> and may
>>> +use multiple clusters, however it must be contiguous in the image 
>>> file.
>> The use of "L1 table" could be confusing.  The refcount metadata uses
>> "refcount table" and "refcount block" to describe a one-level table.
> I agree. Hmm.. dirty bitmaps table? ok?
oh, no, bad idea. dirty bitmaps table is other thing))
>>
>>> +
>>> +Given an offset into the bitmap, the offset into the image file can be
>>> +obtained as follows:
>>> +
>>> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
>> It might help to add granularity to this formula.
>>
>> Instead of "offset", "bit_number" or "bitnr" might be clearer since
>> "offset" means something different in other parts of the document.
> Hmm. In my opinion, the bitmap here is stored as raw data. And 
> granularity is an additional parameter (for deserializing this data). 
> So, it is an offset in bytes for this data. The format is not for 
> accessing bitmap bits, it's only for loading the whole bitmap one time.
>>
>>> +
>>> +L1 table entry:
>>> +
>>> +    Bit  0 -  61:   Standard cluster descriptor
>>> +
>>> +        62 -  63:   Reserved
>> Do you really want to use the standard cluster descriptor with it's zero
>> bit?
>>
>> Since bitmaps don't honor backing files there doesn't seem much point in
>> using the zero bit, things are simpler if just bits 9-55 are contain the
>> host cluster offset and 0 means the cluster is unallocated.
>>
>> By honoring the zero bit there are three states:
>> 1. Zero bit set, read zeroes
>> 2. Zero bit not set, host cluster offset != 0, bits valid
>> 3. Zero bit not set, host cluster offset == 0, unallocated
>>
>> State 1 is not useful.
>>
>>> +=== Bitmap table ===
>>> +
>>> +A directory of all bitmaps is stored in the bitmap table, a 
>>> contiguous area in
>>> +the image file, whose starting offset and length are given by the 
>>> header fields
>>> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the 
>>> bitmap table have
>>> +variable length, depending on the length of name and extra data.
>>> +
>>> +Bitmap table entry:
>>> +
>>> +    Byte 0 -  7:    Offset into the image file at which the L1 
>>> table for the
>>> +                    bitmap starts. Must be aligned to a cluster 
>>> boundary.
>>> +
>>> +         8 - 11:    Number of entries in the L1 table of the bitmap
>>> +
>>> +        12 - 15:    Bitmap granularity in bytes
>>> +
>>> +        16 - 23:    Bitmap size in sectors
>>> +
>>> +        24 - 25:    Size of the bitmap name
>>> +
>>> +        variable:   The name of the bitmap (not null terminated)
>>> +
>>> +        variable:   Padding to round up the bitmap table entry size 
>>> to the
>>> +                    next multiple of 8.
>>> +
>>> +The fields "size", "granularity" and "name" are corresponding with 
>>> the fields
>>> +in struct BdrvDirtyBitmap.
>> Referring to the internals of a C struct in QEMU is not appropriate for
>> a file format specification.  Please document the fields fully including
>> their constraints, minimums, maximums, etc.
>
>


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-10  8:19     ` Vladimir Sementsov-Ogievskiy
  2015-06-10  8:49       ` Vladimir Sementsov-Ogievskiy
@ 2015-06-10 13:00       ` Eric Blake
  2015-06-11 10:16         ` Vladimir Sementsov-Ogievskiy
  2015-06-10 13:24       ` Stefan Hajnoczi
  2 siblings, 1 reply; 76+ messages in thread
From: Eric Blake @ 2015-06-10 13:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

[-- Attachment #1: Type: text/plain, Size: 1556 bytes --]

On 06/10/2015 02:19 AM, Vladimir Sementsov-Ogievskiy wrote:

>>> +Dirty bitmaps is an optional header extension. It provides a
>>> possibility of
>>> +storing dirty bitmaps in qcow2 image. The fields are:
>>> +
>>> +          0 -  3:  nb_dirty_bitmaps
>>> +                   Number of dirty bitmaps contained in the image
>> Is there a maximum?
> hmm. any proposals for this?
>>
>>> +
>>> +          4 - 11:  dirty_bitmaps_offset

I'm not sure if there is a reasonable cap on the number of dirty
bitmaps; I doubt that anyone will actually supply all 4G possible images
allowed by the four-byte field, but don't have a suggestion on a smaller
limit that doesn't feel arbitrary.

[meta-comment] It's very hard to pick out the new content in your reply
if you do not separate your new text with a newline both before and
after (as I'm doing here).


>>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>>> +bitmaps to host clusters. There is only an L1 table.
>>> +
>>> +The L1 table has a variable size (stored in the Bitmap table entry)
>>> and may
>>> +use multiple clusters, however it must be contiguous in the image file.
>> The use of "L1 table" could be confusing.  The refcount metadata uses
>> "refcount table" and "refcount block" to describe a one-level table.
> I agree. Hmm.. dirty bitmaps table? ok?

"dirty bitmaps table" works for me, as a name for the one-level table.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-10  8:19     ` Vladimir Sementsov-Ogievskiy
  2015-06-10  8:49       ` Vladimir Sementsov-Ogievskiy
  2015-06-10 13:00       ` Eric Blake
@ 2015-06-10 13:24       ` Stefan Hajnoczi
  2015-06-11 10:19         ` Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-10 13:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, Stefan Hajnoczi, qemu-devel, Vladimir Sementsov-Ogievskiy,
	pbonzini, den, jsnow

[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]

On Wed, Jun 10, 2015 at 11:19:30AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 09.06.2015 20:03, Stefan Hajnoczi wrote:
> >On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >>@@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
> >>                      terminated if it has full length)
> >>+== Dirty bitmaps ==
> >>+
> >>+Dirty bitmaps is an optional header extension. It provides a possibility of
> >>+storing dirty bitmaps in qcow2 image. The fields are:
> >>+
> >>+          0 -  3:  nb_dirty_bitmaps
> >>+                   Number of dirty bitmaps contained in the image
> >Is there a maximum?
> hmm. any proposals for this?

65535 seems practical.

> >>+=== Cluster mapping ===
> >>+
> >>+Dirty bitmaps are stored using a ONE-level structure for the mapping of
> >>+bitmaps to host clusters. There is only an L1 table.
> >>+
> >>+The L1 table has a variable size (stored in the Bitmap table entry) and may
> >>+use multiple clusters, however it must be contiguous in the image file.
> >The use of "L1 table" could be confusing.  The refcount metadata uses
> >"refcount table" and "refcount block" to describe a one-level table.
> I agree. Hmm.. dirty bitmaps table? ok?

Yes, that is good.

> >>+
> >>+Given an offset into the bitmap, the offset into the image file can be
> >>+obtained as follows:
> >>+
> >>+    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
> >It might help to add granularity to this formula.
> >
> >Instead of "offset", "bit_number" or "bitnr" might be clearer since
> >"offset" means something different in other parts of the document.
> Hmm. In my opinion, the bitmap here is stored as raw data. And granularity
> is an additional parameter (for deserializing this data). So, it is an
> offset in bytes for this data. The format is not for accessing bitmap bits,
> it's only for loading the whole bitmap one time.

You are right, it wasn't clear when I read this the first time.  My
problem was the "offset into the bitmap" doesn't have any units.  So
let's make this more explicit.  Can you document how to go from a bit
number down to the offset?

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature Vladimir Sementsov-Ogievskiy
  2015-06-09 16:52   ` Stefan Hajnoczi
@ 2015-06-10 14:30   ` Stefan Hajnoczi
  2015-06-12 19:02     ` John Snow
  2015-08-14 17:14     ` Vladimir Sementsov-Ogievskiy
  2015-06-11 23:04   ` John Snow
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-10 14:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, pbonzini, den,
	jsnow

[-- Attachment #1: Type: text/plain, Size: 7284 bytes --]

On Mon, Jun 08, 2015 at 06:21:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:

I noticed a corner case, it's probably not a problem in practice:

Since the dirty bitmap is stored with the help of a BlockDriverState
(and its bs->file), it's possible that writing the bitmap will cause
bits in the bitmap to be dirtied!

> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
> new file mode 100644
> index 0000000..bc0167c
> --- /dev/null
> +++ b/block/qcow2-dirty-bitmap.c
> @@ -0,0 +1,503 @@
> +/*
> + * Dirty bitmpas for the QCOW version 2 format

s/bitmpas/bitmaps/

> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmapHeader h;
> +    QCowDirtyBitmap *bm;
> +    int i, name_size;
> +    int64_t offset;
> +    int ret;
> +
> +    if (!s->nb_dirty_bitmaps) {
> +        s->dirty_bitmaps = NULL;
> +        s->dirty_bitmaps_size = 0;
> +        return 0;
> +    }
> +
> +    offset = s->dirty_bitmaps_offset;
> +    s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);

Please use g_try_new0() and handle the NULL return value.
g_new/g_malloc abort the process if there is not enough memory.  When
opening untrusted image files it is possible that large values will be
encountered and allocations fail.  In that case .bdrv_open() should fail
instead of killing QEMU.

Using g_try_*() in QEMU is not an exact science but large data buffers
or allocations where external inputs influence the size are good
candidates.

Other allocations in these patches should do that too.

> +    /* Allocate space for the new dirty bitmap table */
> +    dirty_bitmaps_offset = qcow2_alloc_clusters(bs, dirty_bitmaps_size);
> +    offset = dirty_bitmaps_offset;
> +    if (offset < 0) {
> +        ret = offset;
> +        goto fail;
> +    }
> +    ret = bdrv_flush(bs);

Not sure there is a need for this.  The clusters are inaccessible since
no metadata points to them yet.  Therefore we don't need to flush yet
because there is no risk of seeing an inconsistent state.

> +    /* Write all dirty bitmaps to the new table */
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        bm = s->dirty_bitmaps + i;
> +        memset(&h, 0, sizeof(h));
> +        h.l1_table_offset = cpu_to_be64(bm->l1_table_offset);
> +        h.l1_size = cpu_to_be32(bm->l1_size);
> +        h.bitmap_granularity = cpu_to_be32(bm->bitmap_granularity);
> +        h.bitmap_size = cpu_to_be64(bm->bitmap_size);
> +
> +        name_size = strlen(bm->name);
> +        assert(name_size <= UINT16_MAX);
> +        h.name_size = cpu_to_be16(name_size);
> +        offset = align_offset(offset, 8);
> +
> +        ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += sizeof(h);
> +
> +        ret = bdrv_pwrite(bs->file, offset, bm->name, name_size);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += name_size;
> +    }

If files have many thousands of bitmaps then this loop will be slow.  It
would be much faster to write out 1 cluster at a time.  This probably
doesn't matter in practice since this function doesn't get called much
and normally files will have few bitmaps.

> +
> +    /*
> +     * Update the header extension to point to the new dirty bitmap table. This
> +     * requires the new table and its refcounts to be stable on disk.
> +     */
> +    ret = bdrv_flush(bs);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    s->dirty_bitmaps_offset = dirty_bitmaps_offset;
> +    s->dirty_bitmaps_size = dirty_bitmaps_size;
> +    ret = qcow2_update_header(bs);
> +    if (ret < 0) {
> +        fprintf(stderr, "Could not update qcow2 header\n");
> +        goto fail;
> +    }

qcow2_update_header() does not flush.  We need to flush before freeing
the old clusters in order to guarantee that the file now points to the
new clusters.

> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
> +                            const char *name, uint64_t size,
> +                            int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i, dirty_bitmap_index, ret;
> +    uint64_t offset;
> +    QCowDirtyBitmap *bm;
> +    uint64_t *l1_table;
> +    uint8_t *buf;
> +
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index < 0) {
> +        return NULL;
> +    }
> +    bm = &s->dirty_bitmaps[dirty_bitmap_index];
> +
> +    if (size != bm->bitmap_size || granularity != bm->bitmap_granularity) {
> +        return NULL;
> +    }
> +
> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));

Please use g_try_malloc() with NULL handling.

> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    buf = g_malloc0(bm->l1_size * s->cluster_size);

What is the maximum l1_size value?  cluster_size and l1_size are 32-bit
so with 64 KB cluster_size this overflows if l1_size > 65535.  Do you
want to cast to size_t?

> +    for (i = 0; i < bm->l1_size; ++i) {
> +        offset = be64_to_cpu(l1_table[i]);
> +        if (!(offset & 1)) {

This doesn't honor the 0 offset means unallocated cluster behavior for
the Standard Cluster Descriptor from the qcow2 specification.

> +            ret = bdrv_pread(bs->file, offset, buf + i * s->cluster_size,
> +                             s->cluster_size);
> +            if (ret < 0) {
> +                goto fail;

Missing g_free(buf)

> +    l1_table = g_try_new(uint64_t, bm->l1_size);
> +    if (l1_table == NULL) {
> +        ret = -ENOMEM;
> +        goto fail;
> +    }
> +
> +    /* initialize with zero clusters */
> +    for (i = 0; i < s->l1_size; i++) {
> +        l1_table[i] = cpu_to_be64(1);
> +    }
> +
> +    ret = qcow2_pre_write_overlap_check(bs, 0, bm->l1_table_offset,
> +                                        s->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
> +                      s->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }

Flush is needed here to ensure the bitmap has reached disk before the
dirty_bitmaps array is written out.

> +
> +    g_free(l1_table);
> +    l1_table = NULL;
> +
> +    /* Append the new dirty bitmap to the dirty bitmap list */
> +    new_dirty_bitmap_list = g_new(QCowDirtyBitmap, s->nb_dirty_bitmaps + 1);
> +    if (s->dirty_bitmaps) {
> +        memcpy(new_dirty_bitmap_list, s->dirty_bitmaps,
> +               s->nb_dirty_bitmaps * sizeof(QCowDirtyBitmap));
> +        old_dirty_bitmap_list = s->dirty_bitmaps;
> +    }
> +    s->dirty_bitmaps = new_dirty_bitmap_list;
> +    s->dirty_bitmaps[s->nb_dirty_bitmaps++] = *bm;
> +
> +    ret = qcow2_write_dirty_bitmaps(bs);
> +    if (ret < 0) {
> +        g_free(s->dirty_bitmaps);
> +        s->dirty_bitmaps = old_dirty_bitmap_list;
> +        s->nb_dirty_bitmaps--;
> +        goto fail;
> +    }
> +
> +    g_free(old_dirty_bitmap_list);
> +
> +    return 0;
> +
> +fail:
> +    g_free(bm->name);
> +    g_free(l1_table);
> +

The l1_table clusters should be freed on failure.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 8/8] iotests: test internal persistent dirty bitmap Vladimir Sementsov-Ogievskiy
@ 2015-06-10 15:27 ` Stefan Hajnoczi
  2015-06-11 11:22   ` Vladimir Sementsov-Ogievskiy
  2015-06-11 20:06 ` Stefan Hajnoczi
  2015-06-12 19:34 ` John Snow
  10 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-10 15:27 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: kwolf, qemu-devel, pbonzini, den, jsnow

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

On Mon, Jun 08, 2015 at 06:21:18PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> QCow2 header is extended by fields 'nb_dirty_bitmaps' and
> 'dirty_bitmaps_offset' like with snapshots.
> 
> Proposed command line syntax is the following:
> 
> -dirty-bitmap [option1=val1][,option2=val2]...

Two questions:

1. How does this code ensure that the dirty bitmap is consistent after
crash/power failure?

At the minimum, enabled dirty bitmaps must be discarded after
crash/power failure if we cannot guarantee they are up-to-date.  It's
worse to rely on an outdated dirty bitmap than to detect failure and
start afresh.

2. How do persistent dirty bitmaps work with live migration?  Remember
there are two storage cases for live migration: shared storage (NAS or
SAN) and non-shared storage (disk images must be copied over).

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
  2015-06-09 16:01   ` John Snow
  2015-06-09 17:03   ` Stefan Hajnoczi
@ 2015-06-10 15:34   ` Kevin Wolf
  2015-06-11 10:25     ` Vladimir Sementsov-Ogievskiy
  2015-08-24 10:46     ` Vladimir Sementsov-Ogievskiy
  2015-08-24 13:30   ` Vladimir Sementsov-Ogievskiy
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 76+ messages in thread
From: Kevin Wolf @ 2015-06-10 15:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

Am 08.06.2015 um 17:21 hat Vladimir Sementsov-Ogievskiy geschrieben:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
> other drives (there may be qcow2 file with zero disk size but with
> several dirty bitmaps for other drives).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 121dfc8..0fffba2 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
>                          0x00000000 - End of the header extension area
>                          0xE2792ACA - Backing file format name
>                          0x6803f857 - Feature name table
> +                        0x23852875 - Dirty bitmaps
>                          other      - Unknown header extension, can be safely
>                                       ignored
>  
> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>                      terminated if it has full length)
>  
>  
> +== Dirty bitmaps ==
> +
> +Dirty bitmaps is an optional header extension. It provides a possibility of
> +storing dirty bitmaps in qcow2 image. The fields are:
> +
> +          0 -  3:  nb_dirty_bitmaps
> +                   Number of dirty bitmaps contained in the image
> +
> +          4 - 11:  dirty_bitmaps_offset
> +                   Offset into the image file at which the dirty bitmaps table
> +                   starts. Must be aligned to a cluster boundary.
> +
> +
>  == Host cluster management ==

You need to use a compatibility flag because for old qemu versions, the
dirty bitmaps (and associated metadata) are leaked clusters and qemu-img
check would "repair" them by resetting the refcount to 0.

At second sight, I see that your patches add an autoclear flag.
Presumably the contents of the dirty bitmaps is outdated when you
accessed the image with an older version, so this seems right. We just
need to document it.

>  qcow2 manages the allocation of host clusters by maintaining a reference count
> @@ -360,3 +374,55 @@ Snapshot table entry:
>  
>          variable:   Padding to round up the snapshot table entry size to the
>                      next multiple of 8.
> +
> +
> +== Dirty bitmaps ==
> +
> +The feature supports storing several dirty bitmaps in the qcow2 file.
> +
> +=== Cluster mapping ===
> +
> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
> +bitmaps to host clusters. There is only an L1 table.
> +
> +The L1 table has a variable size (stored in the Bitmap table entry) and may
> +use multiple clusters, however it must be contiguous in the image file.
> +
> +Given an offset into the bitmap, the offset into the image file can be
> +obtained as follows:
> +
> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
> +
> +L1 table entry:
> +
> +    Bit  0 -  61:   Standard cluster descriptor
> +
> +        62 -  63:   Reserved

Stefan already mentioned that we don't have a "L1" when there is only
one level, and that you shouldn't reuse the cluster descriptors from L2
tables.

> +=== Bitmap table ===
> +
> +A directory of all bitmaps is stored in the bitmap table, a contiguous area in
> +the image file, whose starting offset and length are given by the header fields
> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
> +variable length, depending on the length of name and extra data.
> +
> +Bitmap table entry:
> +
> +    Byte 0 -  7:    Offset into the image file at which the L1 table for the
> +                    bitmap starts. Must be aligned to a cluster boundary.
> +
> +         8 - 11:    Number of entries in the L1 table of the bitmap

Worth using 64 bits here? This can only cover 4 * 512 GB = 2 TB for the
smallest possible cluster size. Though it's 65536 * 512 = 32 PB for the
default, which might be enough for a while.

> +        12 - 15:    Bitmap granularity in bytes
> +
> +        16 - 23:    Bitmap size in sectors

Please don't use sectors, that's a meaningless unit. Bytes is better.

> +        24 - 25:    Size of the bitmap name

We should use a smaller limit than the possible 64k to avoid too large
memory allocations. Nobody needs really long bitmap names.

> +
> +        variable:   The name of the bitmap (not null terminated)
> +
> +        variable:   Padding to round up the bitmap table entry size to the
> +                    next multiple of 8.

Kevin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 4/8] block: add bdrv_load_dirty_bitmap
  2015-06-09 16:01   ` Stefan Hajnoczi
@ 2015-06-10 22:33     ` John Snow
  2015-06-11 10:41       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 76+ messages in thread
From: John Snow @ 2015-06-10 22:33 UTC (permalink / raw)
  To: Stefan Hajnoczi, Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini



On 06/09/2015 12:01 PM, Stefan Hajnoczi wrote:
> On Mon, Jun 08, 2015 at 06:21:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs,
>> +                                        BlockDriverState *file,
>> +                                        int granularity,
>> +                                        const char *name,
>> +                                        Error **errp)
>> +{
>> +    BlockDriver *drv = file->drv;
>> +    if (!drv) {
>> +        return NULL;
>> +    }
>> +    if (drv->bdrv_dirty_bitmap_load) {
>> +        BdrvDirtyBitmap *bitmap;
>> +        uint64_t bitmap_size = bdrv_nb_sectors(bs);
>> +        uint8_t *buf = drv->bdrv_dirty_bitmap_load(file, name, bitmap_size,
>> +                                                   granularity);
>> +        if (buf == NULL) {
>> +            return NULL;
>> +        }
>> +
>> +        bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
>> +        if (bitmap == NULL) {
>> +            g_free(buf);
>> +            return NULL;
>> +        }
>> +
>> +        hbitmap_deserialize_part(bitmap->bitmap, buf, 0, bitmap_size);
>> +        hbitmap_deserialize_finish(bitmap->bitmap);
> 
> How about passing bitmap and errp into drv->bdrv_dirty_bitmap_load?
> That way bdrv_dirty_bitmap_load() can stream using
> hbitmap_deserialize_part() and does not need to allocate the full
> bitmap.  It can also report errors properly.
> 

My hunch is that this was avoided because BdrvDirtyBitmap is currently a
structure local only to block.c, but I would be fine with shifting the
header to block_int.h and giving the BdrvDirtyBitmap some limited
exposure outside of the block core file to facilitate some cleaner
function prototypes here.

OR, you could have the qcow2 layer rely on serialization functions that
are written back here in block.c that supports feeding it out
chunk-by-chunk.

Whatever happens to feel cleaner is (probably) fine by me.

--js

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps Vladimir Sementsov-Ogievskiy
  2015-06-09 15:49   ` Stefan Hajnoczi
  2015-06-09 15:50   ` Stefan Hajnoczi
@ 2015-06-10 23:42   ` John Snow
  2015-06-11  8:35     ` Kevin Wolf
  2015-06-11 10:49     ` Vladimir Sementsov-Ogievskiy
  2 siblings, 2 replies; 76+ messages in thread
From: John Snow @ 2015-06-10 23:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-dirty-bitmap.c |  5 +++++
>  block/qcow2.c              | 13 +++++++++++--
>  block/qcow2.h              |  9 +++++++++
>  3 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
> index db83112..686a121 100644
> --- a/block/qcow2-dirty-bitmap.c
> +++ b/block/qcow2-dirty-bitmap.c
> @@ -188,6 +188,11 @@ static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
>  
>      s->dirty_bitmaps_offset = dirty_bitmaps_offset;
>      s->dirty_bitmaps_size = dirty_bitmaps_size;
> +    if (s->nb_dirty_bitmaps > 0) {
> +        s->autoclear_features |= QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
> +    } else {
> +        s->autoclear_features &= ~QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
> +    }
>      ret = qcow2_update_header(bs);
>      if (ret < 0) {
>          fprintf(stderr, "Could not update qcow2 header\n");
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 406e55d..f85a55a 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -182,6 +182,14 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>                  return ret;
>              }
>  
> +            if (!(s->autoclear_features & QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
> +                s->nb_dirty_bitmaps > 0) {
> +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
> +                if (ret < 0) {
> +                    return ret;
> +                }
> +            }
> +
>  #ifdef DEBUG_EXT
>              printf("Qcow2: Got dirty bitmaps extension:"
>                     " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
> @@ -928,8 +936,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>      }
>  
>      /* Clear unknown autoclear feature bits */
> -    if (!bs->read_only && !(flags & BDRV_O_INCOMING) && s->autoclear_features) {
> -        s->autoclear_features = 0;
> +    if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
> +        (s->autoclear_features & ~QCOW2_AUTOCLEAR_MASK)) {
> +        s->autoclear_features |= QCOW2_AUTOCLEAR_MASK;

Like Stefan already mentioned, fixing this |= to &= will fix iotest 036,
which is otherwise broken by this patch.

>          ret = qcow2_update_header(bs);
>          if (ret < 0) {
>              error_setg_errno(errp, -ret, "Could not update qcow2 header");
> diff --git a/block/qcow2.h b/block/qcow2.h
> index b5e576c..14bd6f9 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -215,6 +215,15 @@ enum {
>      QCOW2_COMPAT_FEAT_MASK            = QCOW2_COMPAT_LAZY_REFCOUNTS,
>  };
>  
> +/* Autoclear feature bits */
> +enum {
> +    QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR = 0,
> +    QCOW2_AUTOCLEAR_DIRTY_BITMAPS       =
> +        1 << QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR,
> +
> +    QCOW2_AUTOCLEAR_MASK                = QCOW2_AUTOCLEAR_DIRTY_BITMAPS,
> +};
> +

I find it a little awkward to have an enum with three different kinds of
data in it, unless I am reading this incorrectly. (bit position, bit
masks, and accumulated bit mask.)

Just enumerating the indices is probably sufficient:

enum {
  QCOW2_AUTOCLEAR_BEGIN = 0,
  QCOW2_AUTOCLEAR_DIRTY_BITMAPS = QCOW2_AUTOCLEAR_BEGIN,
  ...,
  QCOW2_AUTOCLEAR_END
}

and then the QCOW2_AUTOCLEAR_MASK can either be programmatically defined
via a function, or just pre-computed as a #define.

If you still want the mask definitions, you could do something cheeky
like this:

#define AUTOCLEAR_MASK(X) (1 << QCOW2_AUTOCLEAR_ ## X)

and then you can use things like AUTOCLEAR_MASK(DIRTY_BITMAPS) without
having to create and maintain two separate tables if you want both forms
easily available.

>  enum qcow2_discard_type {
>      QCOW2_DISCARD_NEVER = 0,
>      QCOW2_DISCARD_ALWAYS,
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-06-10 23:42   ` John Snow
@ 2015-06-11  8:35     ` Kevin Wolf
  2015-06-11 10:49     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 76+ messages in thread
From: Kevin Wolf @ 2015-06-11  8:35 UTC (permalink / raw)
  To: John Snow
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel,
	Vladimir Sementsov-Ogievskiy, stefanha, den, pbonzini

Am 11.06.2015 um 01:42 hat John Snow geschrieben:
> 
> 
> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> > From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > ---
> >  block/qcow2-dirty-bitmap.c |  5 +++++
> >  block/qcow2.c              | 13 +++++++++++--
> >  block/qcow2.h              |  9 +++++++++
> >  3 files changed, 25 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
> > index db83112..686a121 100644
> > --- a/block/qcow2-dirty-bitmap.c
> > +++ b/block/qcow2-dirty-bitmap.c
> > @@ -188,6 +188,11 @@ static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
> >  
> >      s->dirty_bitmaps_offset = dirty_bitmaps_offset;
> >      s->dirty_bitmaps_size = dirty_bitmaps_size;
> > +    if (s->nb_dirty_bitmaps > 0) {
> > +        s->autoclear_features |= QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
> > +    } else {
> > +        s->autoclear_features &= ~QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
> > +    }
> >      ret = qcow2_update_header(bs);
> >      if (ret < 0) {
> >          fprintf(stderr, "Could not update qcow2 header\n");
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 406e55d..f85a55a 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -182,6 +182,14 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
> >                  return ret;
> >              }
> >  
> > +            if (!(s->autoclear_features & QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
> > +                s->nb_dirty_bitmaps > 0) {
> > +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
> > +                if (ret < 0) {
> > +                    return ret;
> > +                }
> > +            }
> > +
> >  #ifdef DEBUG_EXT
> >              printf("Qcow2: Got dirty bitmaps extension:"
> >                     " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
> > @@ -928,8 +936,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
> >      }
> >  
> >      /* Clear unknown autoclear feature bits */
> > -    if (!bs->read_only && !(flags & BDRV_O_INCOMING) && s->autoclear_features) {
> > -        s->autoclear_features = 0;
> > +    if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
> > +        (s->autoclear_features & ~QCOW2_AUTOCLEAR_MASK)) {
> > +        s->autoclear_features |= QCOW2_AUTOCLEAR_MASK;
> 
> Like Stefan already mentioned, fixing this |= to &= will fix iotest 036,
> which is otherwise broken by this patch.
> 
> >          ret = qcow2_update_header(bs);
> >          if (ret < 0) {
> >              error_setg_errno(errp, -ret, "Could not update qcow2 header");
> > diff --git a/block/qcow2.h b/block/qcow2.h
> > index b5e576c..14bd6f9 100644
> > --- a/block/qcow2.h
> > +++ b/block/qcow2.h
> > @@ -215,6 +215,15 @@ enum {
> >      QCOW2_COMPAT_FEAT_MASK            = QCOW2_COMPAT_LAZY_REFCOUNTS,
> >  };
> >  
> > +/* Autoclear feature bits */
> > +enum {
> > +    QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR = 0,
> > +    QCOW2_AUTOCLEAR_DIRTY_BITMAPS       =
> > +        1 << QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR,
> > +
> > +    QCOW2_AUTOCLEAR_MASK                = QCOW2_AUTOCLEAR_DIRTY_BITMAPS,
> > +};
> > +
> 
> I find it a little awkward to have an enum with three different kinds of
> data in it, unless I am reading this incorrectly. (bit position, bit
> masks, and accumulated bit mask.)

This is only consistent with the enums for incompatible and compatible
feature flags. If we were to change that, we should change it
everywhere.

> Just enumerating the indices is probably sufficient:
> 
> enum {
>   QCOW2_AUTOCLEAR_BEGIN = 0,
>   QCOW2_AUTOCLEAR_DIRTY_BITMAPS = QCOW2_AUTOCLEAR_BEGIN,
>   ...,
>   QCOW2_AUTOCLEAR_END
> }

I don't mind the colour of the bikeshed, as long as all constants are
explicitly defined. Letting the compiler assign integers when these
integers are part of an external interface is too easy to break
accidentally.

Kevin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-10 13:00       ` Eric Blake
@ 2015-06-11 10:16         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-11 10:16 UTC (permalink / raw)
  To: Eric Blake, Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

On 10.06.2015 16:00, Eric Blake wrote:
> On 06/10/2015 02:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>
>>>> +Dirty bitmaps is an optional header extension. It provides a
>>>> possibility of
>>>> +storing dirty bitmaps in qcow2 image. The fields are:
>>>> +
>>>> +          0 -  3:  nb_dirty_bitmaps
>>>> +                   Number of dirty bitmaps contained in the image
>>> Is there a maximum?
>> hmm. any proposals for this?
>>>> +
>>>> +          4 - 11:  dirty_bitmaps_offset
> I'm not sure if there is a reasonable cap on the number of dirty
> bitmaps; I doubt that anyone will actually supply all 4G possible images
> allowed by the four-byte field, but don't have a suggestion on a smaller
> limit that doesn't feel arbitrary.
>
> [meta-comment] It's very hard to pick out the new content in your reply
> if you do not separate your new text with a newline both before and
> after (as I'm doing here).
>
>
>>>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>>>> +bitmaps to host clusters. There is only an L1 table.
>>>> +
>>>> +The L1 table has a variable size (stored in the Bitmap table entry)
>>>> and may
>>>> +use multiple clusters, however it must be contiguous in the image file.
>>> The use of "L1 table" could be confusing.  The refcount metadata uses
>>> "refcount table" and "refcount block" to describe a one-level table.
>> I agree. Hmm.. dirty bitmaps table? ok?
> "dirty bitmaps table" works for me, as a name for the one-level table.
>

for now, dirty bitmaps table is the table with bitmap descriptors, and 
each bitmap descriptor contains its own l1 table..
What about dirty bitmap directory for descriptors and dirty bitmap table 
for l1? like pde pte)

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-10 13:24       ` Stefan Hajnoczi
@ 2015-06-11 10:19         ` Vladimir Sementsov-Ogievskiy
  2015-06-11 13:03           ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-11 10:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, Stefan Hajnoczi, qemu-devel, Vladimir Sementsov-Ogievskiy,
	pbonzini, den, jsnow

On 10.06.2015 16:24, Stefan Hajnoczi wrote:
> On Wed, Jun 10, 2015 at 11:19:30AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 09.06.2015 20:03, Stefan Hajnoczi wrote:
>>> On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>>>>                       terminated if it has full length)
>>>> +== Dirty bitmaps ==
>>>> +
>>>> +Dirty bitmaps is an optional header extension. It provides a possibility of
>>>> +storing dirty bitmaps in qcow2 image. The fields are:
>>>> +
>>>> +          0 -  3:  nb_dirty_bitmaps
>>>> +                   Number of dirty bitmaps contained in the image
>>> Is there a maximum?
>> hmm. any proposals for this?
> 65535 seems practical.

So, you suggest to reduce this field width to 2b? And additional 2 bytes 
reserved field, to achieve 8b-alignment?

>
>>>> +=== Cluster mapping ===
>>>> +
>>>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>>>> +bitmaps to host clusters. There is only an L1 table.
>>>> +
>>>> +The L1 table has a variable size (stored in the Bitmap table entry) and may
>>>> +use multiple clusters, however it must be contiguous in the image file.
>>> The use of "L1 table" could be confusing.  The refcount metadata uses
>>> "refcount table" and "refcount block" to describe a one-level table.
>> I agree. Hmm.. dirty bitmaps table? ok?
> Yes, that is good.
>
>>>> +
>>>> +Given an offset into the bitmap, the offset into the image file can be
>>>> +obtained as follows:
>>>> +
>>>> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
>>> It might help to add granularity to this formula.
>>>
>>> Instead of "offset", "bit_number" or "bitnr" might be clearer since
>>> "offset" means something different in other parts of the document.
>> Hmm. In my opinion, the bitmap here is stored as raw data. And granularity
>> is an additional parameter (for deserializing this data). So, it is an
>> offset in bytes for this data. The format is not for accessing bitmap bits,
>> it's only for loading the whole bitmap one time.
> You are right, it wasn't clear when I read this the first time.  My
> problem was the "offset into the bitmap" doesn't have any units.  So
> let's make this more explicit.  Can you document how to go from a bit
> number down to the offset?


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-10 15:34   ` Kevin Wolf
@ 2015-06-11 10:25     ` Vladimir Sementsov-Ogievskiy
  2015-06-11 16:30       ` John Snow
  2015-08-24 10:46     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-11 10:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

On 10.06.2015 18:34, Kevin Wolf wrote:
> Am 08.06.2015 um 17:21 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>
>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>> other drives (there may be qcow2 file with zero disk size but with
>> several dirty bitmaps for other drives).
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>> index 121dfc8..0fffba2 100644
>> --- a/docs/specs/qcow2.txt
>> +++ b/docs/specs/qcow2.txt
>> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
>>                           0x00000000 - End of the header extension area
>>                           0xE2792ACA - Backing file format name
>>                           0x6803f857 - Feature name table
>> +                        0x23852875 - Dirty bitmaps
>>                           other      - Unknown header extension, can be safely
>>                                        ignored
>>   
>> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>>                       terminated if it has full length)
>>   
>>   
>> +== Dirty bitmaps ==
>> +
>> +Dirty bitmaps is an optional header extension. It provides a possibility of
>> +storing dirty bitmaps in qcow2 image. The fields are:
>> +
>> +          0 -  3:  nb_dirty_bitmaps
>> +                   Number of dirty bitmaps contained in the image
>> +
>> +          4 - 11:  dirty_bitmaps_offset
>> +                   Offset into the image file at which the dirty bitmaps table
>> +                   starts. Must be aligned to a cluster boundary.
>> +
>> +
>>   == Host cluster management ==
> You need to use a compatibility flag because for old qemu versions, the
> dirty bitmaps (and associated metadata) are leaked clusters and qemu-img
> check would "repair" them by resetting the refcount to 0.
>
> At second sight, I see that your patches add an autoclear flag.
> Presumably the contents of the dirty bitmaps is outdated when you
> accessed the image with an older version, so this seems right. We just
> need to document it.
>
>>   qcow2 manages the allocation of host clusters by maintaining a reference count
>> @@ -360,3 +374,55 @@ Snapshot table entry:
>>   
>>           variable:   Padding to round up the snapshot table entry size to the
>>                       next multiple of 8.
>> +
>> +
>> +== Dirty bitmaps ==
>> +
>> +The feature supports storing several dirty bitmaps in the qcow2 file.
>> +
>> +=== Cluster mapping ===
>> +
>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>> +bitmaps to host clusters. There is only an L1 table.
>> +
>> +The L1 table has a variable size (stored in the Bitmap table entry) and may
>> +use multiple clusters, however it must be contiguous in the image file.
>> +
>> +Given an offset into the bitmap, the offset into the image file can be
>> +obtained as follows:
>> +
>> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
>> +
>> +L1 table entry:
>> +
>> +    Bit  0 -  61:   Standard cluster descriptor
>> +
>> +        62 -  63:   Reserved
> Stefan already mentioned that we don't have a "L1" when there is only
> one level, and that you shouldn't reuse the cluster descriptors from L2
> tables.
>
>> +=== Bitmap table ===
>> +
>> +A directory of all bitmaps is stored in the bitmap table, a contiguous area in
>> +the image file, whose starting offset and length are given by the header fields
>> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
>> +variable length, depending on the length of name and extra data.
>> +
>> +Bitmap table entry:
>> +
>> +    Byte 0 -  7:    Offset into the image file at which the L1 table for the
>> +                    bitmap starts. Must be aligned to a cluster boundary.
>> +
>> +         8 - 11:    Number of entries in the L1 table of the bitmap
> Worth using 64 bits here? This can only cover 4 * 512 GB = 2 TB for the
> smallest possible cluster size. Though it's 65536 * 512 = 32 PB for the
> default, which might be enough for a while.
>
>> +        12 - 15:    Bitmap granularity in bytes
>> +
>> +        16 - 23:    Bitmap size in sectors
> Please don't use sectors, that's a meaningless unit. Bytes is better.
Just bad description. Actually it is ~ (number of bits in bitmap * 
granularity), and it is corresponding to number of sectors in the image.
>
>> +        24 - 25:    Size of the bitmap name
> We should use a smaller limit than the possible 64k to avoid too large
> memory allocations. Nobody needs really long bitmap names.
>
>> +
>> +        variable:   The name of the bitmap (not null terminated)
>> +
>> +        variable:   Padding to round up the bitmap table entry size to the
>> +                    next multiple of 8.
> Kevin


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 4/8] block: add bdrv_load_dirty_bitmap
  2015-06-10 22:33     ` John Snow
@ 2015-06-11 10:41       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-11 10:41 UTC (permalink / raw)
  To: John Snow, Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini

On 11.06.2015 01:33, John Snow wrote:
>
> On 06/09/2015 12:01 PM, Stefan Hajnoczi wrote:
>> On Mon, Jun 08, 2015 at 06:21:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> +BdrvDirtyBitmap *bdrv_load_dirty_bitmap(BlockDriverState *bs,
>>> +                                        BlockDriverState *file,
>>> +                                        int granularity,
>>> +                                        const char *name,
>>> +                                        Error **errp)
>>> +{
>>> +    BlockDriver *drv = file->drv;
>>> +    if (!drv) {
>>> +        return NULL;
>>> +    }
>>> +    if (drv->bdrv_dirty_bitmap_load) {
>>> +        BdrvDirtyBitmap *bitmap;
>>> +        uint64_t bitmap_size = bdrv_nb_sectors(bs);
>>> +        uint8_t *buf = drv->bdrv_dirty_bitmap_load(file, name, bitmap_size,
>>> +                                                   granularity);
>>> +        if (buf == NULL) {
>>> +            return NULL;
>>> +        }
>>> +
>>> +        bitmap = bdrv_create_dirty_bitmap(bs, granularity, name, errp);
>>> +        if (bitmap == NULL) {
>>> +            g_free(buf);
>>> +            return NULL;
>>> +        }
>>> +
>>> +        hbitmap_deserialize_part(bitmap->bitmap, buf, 0, bitmap_size);
>>> +        hbitmap_deserialize_finish(bitmap->bitmap);
>> How about passing bitmap and errp into drv->bdrv_dirty_bitmap_load?
>> That way bdrv_dirty_bitmap_load() can stream using
>> hbitmap_deserialize_part() and does not need to allocate the full
>> bitmap.  It can also report errors properly.
>>
> My hunch is that this was avoided because BdrvDirtyBitmap is currently a
> structure local only to block.c, but I would be fine with shifting the
> header to block_int.h and giving the BdrvDirtyBitmap some limited
> exposure outside of the block core file to facilitate some cleaner
> function prototypes here.
>
> OR, you could have the qcow2 layer rely on serialization functions that
> are written back here in block.c that supports feeding it out
> chunk-by-chunk.
>
> Whatever happens to feel cleaner is (probably) fine by me.
>
> --js
I'll just use bdrv_dirty_bitmap_deserialize_part, etc, which are already 
in the code.

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-06-10 23:42   ` John Snow
  2015-06-11  8:35     ` Kevin Wolf
@ 2015-06-11 10:49     ` Vladimir Sementsov-Ogievskiy
  2015-06-11 16:36       ` John Snow
  1 sibling, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-11 10:49 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den

On 11.06.2015 02:42, John Snow wrote:
>
> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2-dirty-bitmap.c |  5 +++++
>>   block/qcow2.c              | 13 +++++++++++--
>>   block/qcow2.h              |  9 +++++++++
>>   3 files changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
>> index db83112..686a121 100644
>> --- a/block/qcow2-dirty-bitmap.c
>> +++ b/block/qcow2-dirty-bitmap.c
>> @@ -188,6 +188,11 @@ static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
>>   
>>       s->dirty_bitmaps_offset = dirty_bitmaps_offset;
>>       s->dirty_bitmaps_size = dirty_bitmaps_size;
>> +    if (s->nb_dirty_bitmaps > 0) {
>> +        s->autoclear_features |= QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
>> +    } else {
>> +        s->autoclear_features &= ~QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
>> +    }
>>       ret = qcow2_update_header(bs);
>>       if (ret < 0) {
>>           fprintf(stderr, "Could not update qcow2 header\n");
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 406e55d..f85a55a 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -182,6 +182,14 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>                   return ret;
>>               }
>>   
>> +            if (!(s->autoclear_features & QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
>> +                s->nb_dirty_bitmaps > 0) {
>> +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
>> +                if (ret < 0) {
>> +                    return ret;
>> +                }
>> +            }
>> +
>>   #ifdef DEBUG_EXT
>>               printf("Qcow2: Got dirty bitmaps extension:"
>>                      " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
>> @@ -928,8 +936,9 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>       }
>>   
>>       /* Clear unknown autoclear feature bits */
>> -    if (!bs->read_only && !(flags & BDRV_O_INCOMING) && s->autoclear_features) {
>> -        s->autoclear_features = 0;
>> +    if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
>> +        (s->autoclear_features & ~QCOW2_AUTOCLEAR_MASK)) {
>> +        s->autoclear_features |= QCOW2_AUTOCLEAR_MASK;
> Like Stefan already mentioned, fixing this |= to &= will fix iotest 036,
> which is otherwise broken by this patch.
>
>>           ret = qcow2_update_header(bs);
>>           if (ret < 0) {
>>               error_setg_errno(errp, -ret, "Could not update qcow2 header");
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index b5e576c..14bd6f9 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -215,6 +215,15 @@ enum {
>>       QCOW2_COMPAT_FEAT_MASK            = QCOW2_COMPAT_LAZY_REFCOUNTS,
>>   };
>>   
>> +/* Autoclear feature bits */
>> +enum {
>> +    QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR = 0,
>> +    QCOW2_AUTOCLEAR_DIRTY_BITMAPS       =
>> +        1 << QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR,
>> +
>> +    QCOW2_AUTOCLEAR_MASK                = QCOW2_AUTOCLEAR_DIRTY_BITMAPS,
>> +};
>> +
> I find it a little awkward to have an enum with three different kinds of
> data in it, unless I am reading this incorrectly. (bit position, bit
> masks, and accumulated bit mask.)
>
> Just enumerating the indices is probably sufficient:
>
> enum {
>    QCOW2_AUTOCLEAR_BEGIN = 0,
>    QCOW2_AUTOCLEAR_DIRTY_BITMAPS = QCOW2_AUTOCLEAR_BEGIN,
>    ...,
>    QCOW2_AUTOCLEAR_END
> }
>
> and then the QCOW2_AUTOCLEAR_MASK can either be programmatically defined
> via a function, or just pre-computed as a #define.
>
> If you still want the mask definitions, you could do something cheeky
> like this:
>
> #define AUTOCLEAR_MASK(X) (1 << QCOW2_AUTOCLEAR_ ## X)
>
> and then you can use things like AUTOCLEAR_MASK(DIRTY_BITMAPS) without
> having to create and maintain two separate tables if you want both forms
> easily available.


This enum is made like enums for  QCOW2_INCOMPAT_* and QCOW2_COMPAT_*, 
which are already in the code... Then, may I make a patch for them too? 
I agree, it is strange solution to put things of different nature to one 
enum.


>
>>   enum qcow2_discard_type {
>>       QCOW2_DISCARD_NEVER = 0,
>>       QCOW2_DISCARD_ALWAYS,
>>


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-10 15:27 ` [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Stefan Hajnoczi
@ 2015-06-11 11:22   ` Vladimir Sementsov-Ogievskiy
  2015-06-11 13:14     ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-11 11:22 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, pbonzini, den, jsnow

On 10.06.2015 18:27, Stefan Hajnoczi wrote:
> On Mon, Jun 08, 2015 at 06:21:18PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> QCow2 header is extended by fields 'nb_dirty_bitmaps' and
>> 'dirty_bitmaps_offset' like with snapshots.
>>
>> Proposed command line syntax is the following:
>>
>> -dirty-bitmap [option1=val1][,option2=val2]...
> Two questions:
>
> 1. How does this code ensure that the dirty bitmap is consistent after
> crash/power failure?

It's not done yet. What about consistent ('dirty' is not very good for 
dirty bitmaps=) flag for every bitmap? Set it on save and unset on load..

>
> At the minimum, enabled dirty bitmaps must be discarded after
> crash/power failure if we cannot guarantee they are up-to-date.  It's
> worse to rely on an outdated dirty bitmap than to detect failure and
> start afresh.
>
> 2. How do persistent dirty bitmaps work with live migration?  Remember
> there are two storage cases for live migration: shared storage (NAS or
> SAN) and non-shared storage (disk images must be copied over).
For now:
Only loaded bitmaps are migrated.
So, for shared image, all is ok: loaded bitmaps are migrated (in 
migration, if there is a bitmap with same name, size and granularity on 
destination, then it will be transparently used as destination bitmap), 
not loaded bitmaps are the same in the image.
For non-shared storage, not loaded bitmaps are not migrated at all. 
Hmm.. is it bad? Looks like so.

I can add a function to load all not loaded bitmaps from the image in 
disabled state. Then variants:
1) call it automatically before migration
2) add a cmd parameter, to load 'all other bitmaps' in disabled state
3) always load all available bitmaps.

(1), (3) are bad I think, because bitmaps may be stored in separate file 
(especially for non-qcow2 images), and, if this file is not mentioned in 
cmd (all bitmap are not loaded), then there is no possibility of 
migrating them automatically.


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-11 10:19         ` Vladimir Sementsov-Ogievskiy
@ 2015-06-11 13:03           ` Stefan Hajnoczi
  2015-06-11 16:21             ` John Snow
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-11 13:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi,
	den, pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]

On Thu, Jun 11, 2015 at 01:19:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 10.06.2015 16:24, Stefan Hajnoczi wrote:
> >On Wed, Jun 10, 2015 at 11:19:30AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >>On 09.06.2015 20:03, Stefan Hajnoczi wrote:
> >>>On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >>>>@@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
> >>>>                      terminated if it has full length)
> >>>>+== Dirty bitmaps ==
> >>>>+
> >>>>+Dirty bitmaps is an optional header extension. It provides a possibility of
> >>>>+storing dirty bitmaps in qcow2 image. The fields are:
> >>>>+
> >>>>+          0 -  3:  nb_dirty_bitmaps
> >>>>+                   Number of dirty bitmaps contained in the image
> >>>Is there a maximum?
> >>hmm. any proposals for this?
> >65535 seems practical.
> 
> So, you suggest to reduce this field width to 2b? And additional 2 bytes
> reserved field, to achieve 8b-alignment?

No, I would leave it 32-bit but impose a little (which can be increased
later if necessary).  That's how nb_snapshots works too.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-11 11:22   ` Vladimir Sementsov-Ogievskiy
@ 2015-06-11 13:14     ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-11 13:14 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Stefan Hajnoczi, den, pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 2386 bytes --]

On Thu, Jun 11, 2015 at 02:22:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 10.06.2015 18:27, Stefan Hajnoczi wrote:
> >On Mon, Jun 08, 2015 at 06:21:18PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >>QCow2 header is extended by fields 'nb_dirty_bitmaps' and
> >>'dirty_bitmaps_offset' like with snapshots.
> >>
> >>Proposed command line syntax is the following:
> >>
> >>-dirty-bitmap [option1=val1][,option2=val2]...
> >Two questions:
> >
> >1. How does this code ensure that the dirty bitmap is consistent after
> >crash/power failure?
> 
> It's not done yet. What about consistent ('dirty' is not very good for dirty
> bitmaps=) flag for every bitmap? Set it on save and unset on load..

Okay.  The consistency issue is key to dirty bitmaps so it must be
addressed before we merge patches.

Other terms to describe the flag: "in_use" or "outdated"

> >
> >At the minimum, enabled dirty bitmaps must be discarded after
> >crash/power failure if we cannot guarantee they are up-to-date.  It's
> >worse to rely on an outdated dirty bitmap than to detect failure and
> >start afresh.
> >
> >2. How do persistent dirty bitmaps work with live migration?  Remember
> >there are two storage cases for live migration: shared storage (NAS or
> >SAN) and non-shared storage (disk images must be copied over).
> For now:
> Only loaded bitmaps are migrated.

I see.  That is probably fine.

> So, for shared image, all is ok: loaded bitmaps are migrated (in migration,
> if there is a bitmap with same name, size and granularity on destination,
> then it will be transparently used as destination bitmap), not loaded
> bitmaps are the same in the image.

Code might be necessary to ensure that:

1. The source host does not store the bitmap after successful live
   migration handover.  (It could overwrite new data with old data!)

2. The destination host does not discard an "in_use" bitmap when it
   opens the qcow2 file before migration handover.

> For non-shared storage, not loaded bitmaps are not migrated at all. Hmm.. is
> it bad? Looks like so.

That's probably okay since loaded bitmaps are migrated.  Non-shared
storage migration only transfers the contents of attached disks, it does
not transfer qcow2 internal snapshots, for example.  So the current
behavior is consistent with qcow2 non-shared storage semantics.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-11 13:03           ` Stefan Hajnoczi
@ 2015-06-11 16:21             ` John Snow
  2015-06-12 10:28               ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: John Snow @ 2015-06-11 16:21 UTC (permalink / raw)
  To: Stefan Hajnoczi, Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi,
	den, pbonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256



On 06/11/2015 09:03 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 11, 2015 at 01:19:24PM +0300, Vladimir
> Sementsov-Ogievskiy wrote:
>> On 10.06.2015 16:24, Stefan Hajnoczi wrote:
>>> On Wed, Jun 10, 2015 at 11:19:30AM +0300, Vladimir
>>> Sementsov-Ogievskiy wrote:
>>>> On 09.06.2015 20:03, Stefan Hajnoczi wrote:
>>>>> On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir
>>>>> Sementsov-Ogievskiy wrote:
>>>>>> @@ -166,6 +167,19 @@ the header extension data. Each
>>>>>> entry look like this: terminated if it has full length) 
>>>>>> +== Dirty bitmaps == + +Dirty bitmaps is an optional
>>>>>> header extension. It provides a possibility of +storing
>>>>>> dirty bitmaps in qcow2 image. The fields are: + +
>>>>>> 0 -  3:  nb_dirty_bitmaps +                   Number of
>>>>>> dirty bitmaps contained in the image
>>>>> Is there a maximum?
>>>> hmm. any proposals for this?
>>> 65535 seems practical.
>> 
>> So, you suggest to reduce this field width to 2b? And additional
>> 2 bytes reserved field, to achieve 8b-alignment?
> 
> No, I would leave it 32-bit but impose a little (which can be
> increased later if necessary).  That's how nb_snapshots works too.
> 

Doesn't the code already limit the number of bitmaps via +#define
QCOW_MAX_DIRTY_BITMAPS 65536, from patch 2?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVebWLAAoJEH3vgQaq/DkOJCYP/jUSJT+jhb3+GvAtddCssyYR
u1BHZacXyTsDTwX4WDZQ5eGEZJZeSwu7++w5N+m+62yDxervarfEE0G/nuGRSNWx
0zYF0RrlYZFdDqed18rgXJJjCtNo1jp67ojk+xpEBUMx9cgFa6s+BTkrY0h+4hiO
V3mvU0H1+8by1Ss5lvziKCHdrksGyBIS4gw+WZNshdOc46/nBZfSlh6CWmtOO/5S
XZwXLKE7QMJMzigdcLJBOlymRwnF094Myklf8fZQILgbdoHoKhEEj9gVWkSpoNk9
FkMDDS1qN5vtYy5Ehzwy9QpbsN5ZEhuHoj5N8k0vDfFHgB9KKvOChvxf2lVhgbz7
fvGpqUb4eEdTvRno9V+8KoEcs99JXLvhed8LrfcZzq05WKbLeAdXYj18QrDw8pdY
Fl4kV5Ca4dpvDAcNZDlCKERv+STLh56hYXEYtjzNEXL+ryQwUyHetY/M6Qodq0j2
FtJq21aj68vEOovQQcX2QxqRxkPzDEvNPbM+phBOh2FjQkbvB6I5bs/ueloyi2q9
UtXWhR6ImUgA6LN25OIc6GS9xYJsFiQlLh1uI/bJoDEpQvVnMojAXE7SohyTya89
2+HIGJsdkbBZsc4SN1INqcsRCeN1at8KiwdIbAijrciF9WIsv0kUEvCvmA93UVYp
s2Os9g5QgMXrK1icCK5J
=CIuZ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-11 10:25     ` Vladimir Sementsov-Ogievskiy
@ 2015-06-11 16:30       ` John Snow
  2015-06-12  8:33         ` Kevin Wolf
  0 siblings, 1 reply; 76+ messages in thread
From: John Snow @ 2015-06-11 16:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
  Cc: qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den



On 06/11/2015 06:25 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 10.06.2015 18:34, Kevin Wolf wrote:
>> Am 08.06.2015 um 17:21 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>>
>>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>>> other drives (there may be qcow2 file with zero disk size but with
>>> several dirty bitmaps for other drives).
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   docs/specs/qcow2.txt | 66
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 66 insertions(+)
>>>
>>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>>> index 121dfc8..0fffba2 100644
>>> --- a/docs/specs/qcow2.txt
>>> +++ b/docs/specs/qcow2.txt
>>> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like
>>> the following:
>>>                           0x00000000 - End of the header extension area
>>>                           0xE2792ACA - Backing file format name
>>>                           0x6803f857 - Feature name table
>>> +                        0x23852875 - Dirty bitmaps
>>>                           other      - Unknown header extension, can
>>> be safely
>>>                                        ignored
>>>   @@ -166,6 +167,19 @@ the header extension data. Each entry look
>>> like this:
>>>                       terminated if it has full length)
>>>     +== Dirty bitmaps ==
>>> +
>>> +Dirty bitmaps is an optional header extension. It provides a
>>> possibility of
>>> +storing dirty bitmaps in qcow2 image. The fields are:
>>> +
>>> +          0 -  3:  nb_dirty_bitmaps
>>> +                   Number of dirty bitmaps contained in the image
>>> +
>>> +          4 - 11:  dirty_bitmaps_offset
>>> +                   Offset into the image file at which the dirty
>>> bitmaps table
>>> +                   starts. Must be aligned to a cluster boundary.
>>> +
>>> +
>>>   == Host cluster management ==
>> You need to use a compatibility flag because for old qemu versions, the
>> dirty bitmaps (and associated metadata) are leaked clusters and qemu-img
>> check would "repair" them by resetting the refcount to 0.
>>
>> At second sight, I see that your patches add an autoclear flag.
>> Presumably the contents of the dirty bitmaps is outdated when you
>> accessed the image with an older version, so this seems right. We just
>> need to document it.
>>
>>>   qcow2 manages the allocation of host clusters by maintaining a
>>> reference count
>>> @@ -360,3 +374,55 @@ Snapshot table entry:
>>>             variable:   Padding to round up the snapshot table entry
>>> size to the
>>>                       next multiple of 8.
>>> +
>>> +
>>> +== Dirty bitmaps ==
>>> +
>>> +The feature supports storing several dirty bitmaps in the qcow2 file.
>>> +
>>> +=== Cluster mapping ===
>>> +
>>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>>> +bitmaps to host clusters. There is only an L1 table.
>>> +
>>> +The L1 table has a variable size (stored in the Bitmap table entry)
>>> and may
>>> +use multiple clusters, however it must be contiguous in the image file.
>>> +
>>> +Given an offset into the bitmap, the offset into the image file can be
>>> +obtained as follows:
>>> +
>>> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
>>> +
>>> +L1 table entry:
>>> +
>>> +    Bit  0 -  61:   Standard cluster descriptor
>>> +
>>> +        62 -  63:   Reserved
>> Stefan already mentioned that we don't have a "L1" when there is only
>> one level, and that you shouldn't reuse the cluster descriptors from L2
>> tables.
>>
>>> +=== Bitmap table ===
>>> +
>>> +A directory of all bitmaps is stored in the bitmap table, a
>>> contiguous area in
>>> +the image file, whose starting offset and length are given by the
>>> header fields
>>> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap
>>> table have
>>> +variable length, depending on the length of name and extra data.
>>> +
>>> +Bitmap table entry:
>>> +
>>> +    Byte 0 -  7:    Offset into the image file at which the L1 table
>>> for the
>>> +                    bitmap starts. Must be aligned to a cluster
>>> boundary.
>>> +
>>> +         8 - 11:    Number of entries in the L1 table of the bitmap
>> Worth using 64 bits here? This can only cover 4 * 512 GB = 2 TB for the
>> smallest possible cluster size. Though it's 65536 * 512 = 32 PB for the
>> default, which might be enough for a while.
>>
>>> +        12 - 15:    Bitmap granularity in bytes
>>> +
>>> +        16 - 23:    Bitmap size in sectors
>> Please don't use sectors, that's a meaningless unit. Bytes is better.
> Just bad description. Actually it is ~ (number of bits in bitmap *
> granularity), and it is corresponding to number of sectors in the image.

In defense of this, it does happen to be sectors, but what it /really/
represents is the virtual addressable range of the bitmap (its 'size'),
which just-so-happens to be a sector bitmap.

We could just remove the word "sectors" entirely, and just flatly call
it the bitmap size -- but this does reveal the internal nature of the
block layer, which uses sector bitmaps.

If you wish, we can rework this field to use bytes and just convert on
every load/store into the format that we actually require. I suppose
it'd match the QMP interface in that way.

>>
>>> +        24 - 25:    Size of the bitmap name
>> We should use a smaller limit than the possible 64k to avoid too large
>> memory allocations. Nobody needs really long bitmap names.
>>
>>> +
>>> +        variable:   The name of the bitmap (not null terminated)
>>> +
>>> +        variable:   Padding to round up the bitmap table entry size
>>> to the
>>> +                    next multiple of 8.
>> Kevin
> 
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-06-11 10:49     ` Vladimir Sementsov-Ogievskiy
@ 2015-06-11 16:36       ` John Snow
  0 siblings, 0 replies; 76+ messages in thread
From: John Snow @ 2015-06-11 16:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den



On 06/11/2015 06:49 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 11.06.2015 02:42, John Snow wrote:
>>
>> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/qcow2-dirty-bitmap.c |  5 +++++
>>>   block/qcow2.c              | 13 +++++++++++--
>>>   block/qcow2.h              |  9 +++++++++
>>>   3 files changed, 25 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
>>> index db83112..686a121 100644
>>> --- a/block/qcow2-dirty-bitmap.c
>>> +++ b/block/qcow2-dirty-bitmap.c
>>> @@ -188,6 +188,11 @@ static int
>>> qcow2_write_dirty_bitmaps(BlockDriverState *bs)
>>>         s->dirty_bitmaps_offset = dirty_bitmaps_offset;
>>>       s->dirty_bitmaps_size = dirty_bitmaps_size;
>>> +    if (s->nb_dirty_bitmaps > 0) {
>>> +        s->autoclear_features |= QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
>>> +    } else {
>>> +        s->autoclear_features &= ~QCOW2_AUTOCLEAR_DIRTY_BITMAPS;
>>> +    }
>>>       ret = qcow2_update_header(bs);
>>>       if (ret < 0) {
>>>           fprintf(stderr, "Could not update qcow2 header\n");
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 406e55d..f85a55a 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -182,6 +182,14 @@ static int
>>> qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>>                   return ret;
>>>               }
>>>   +            if (!(s->autoclear_features &
>>> QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
>>> +                s->nb_dirty_bitmaps > 0) {
>>> +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
>>> +                if (ret < 0) {
>>> +                    return ret;
>>> +                }
>>> +            }
>>> +
>>>   #ifdef DEBUG_EXT
>>>               printf("Qcow2: Got dirty bitmaps extension:"
>>>                      " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
>>> @@ -928,8 +936,9 @@ static int qcow2_open(BlockDriverState *bs, QDict
>>> *options, int flags,
>>>       }
>>>         /* Clear unknown autoclear feature bits */
>>> -    if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
>>> s->autoclear_features) {
>>> -        s->autoclear_features = 0;
>>> +    if (!bs->read_only && !(flags & BDRV_O_INCOMING) &&
>>> +        (s->autoclear_features & ~QCOW2_AUTOCLEAR_MASK)) {
>>> +        s->autoclear_features |= QCOW2_AUTOCLEAR_MASK;
>> Like Stefan already mentioned, fixing this |= to &= will fix iotest 036,
>> which is otherwise broken by this patch.
>>
>>>           ret = qcow2_update_header(bs);
>>>           if (ret < 0) {
>>>               error_setg_errno(errp, -ret, "Could not update qcow2
>>> header");
>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>> index b5e576c..14bd6f9 100644
>>> --- a/block/qcow2.h
>>> +++ b/block/qcow2.h
>>> @@ -215,6 +215,15 @@ enum {
>>>       QCOW2_COMPAT_FEAT_MASK            = QCOW2_COMPAT_LAZY_REFCOUNTS,
>>>   };
>>>   +/* Autoclear feature bits */
>>> +enum {
>>> +    QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR = 0,
>>> +    QCOW2_AUTOCLEAR_DIRTY_BITMAPS       =
>>> +        1 << QCOW2_AUTOCLEAR_DIRTY_BITMAPS_BITNR,
>>> +
>>> +    QCOW2_AUTOCLEAR_MASK                =
>>> QCOW2_AUTOCLEAR_DIRTY_BITMAPS,
>>> +};
>>> +
>> I find it a little awkward to have an enum with three different kinds of
>> data in it, unless I am reading this incorrectly. (bit position, bit
>> masks, and accumulated bit mask.)
>>
>> Just enumerating the indices is probably sufficient:
>>
>> enum {
>>    QCOW2_AUTOCLEAR_BEGIN = 0,
>>    QCOW2_AUTOCLEAR_DIRTY_BITMAPS = QCOW2_AUTOCLEAR_BEGIN,
>>    ...,
>>    QCOW2_AUTOCLEAR_END
>> }
>>
>> and then the QCOW2_AUTOCLEAR_MASK can either be programmatically defined
>> via a function, or just pre-computed as a #define.
>>
>> If you still want the mask definitions, you could do something cheeky
>> like this:
>>
>> #define AUTOCLEAR_MASK(X) (1 << QCOW2_AUTOCLEAR_ ## X)
>>
>> and then you can use things like AUTOCLEAR_MASK(DIRTY_BITMAPS) without
>> having to create and maintain two separate tables if you want both forms
>> easily available.
> 
> 
> This enum is made like enums for  QCOW2_INCOMPAT_* and QCOW2_COMPAT_*,
> which are already in the code... Then, may I make a patch for them too?
> I agree, it is strange solution to put things of different nature to one
> enum.
> 

Follow Kevin's lead, here -- It looked strange to me, but it _is_ best
to follow the existing style. I didn't look at the surrounding code too
carefully.

> 
>>
>>>   enum qcow2_discard_type {
>>>       QCOW2_DISCARD_NEVER = 0,
>>>       QCOW2_DISCARD_ALWAYS,
>>>
> 
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2015-06-10 15:27 ` [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Stefan Hajnoczi
@ 2015-06-11 20:06 ` Stefan Hajnoczi
  2015-06-12  9:58   ` Denis V. Lunev
  2015-06-12 19:34 ` John Snow
  10 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-11 20:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: kwolf, qemu-devel, pbonzini, den, jsnow

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

The load/store API is not scalable when bitmaps are 1 MB or larger.

For example, a 500 GB disk image with 64 KB granularity requires a 1 MB
bitmap.  If a guest has several disk images of this size, then multiple
megabytes must be read to start the guest and written out to shut down
the guest.

By comparison, the L1 table for the 500 GB disk image is less than 8 KB.

I think something like qcow2-cache.c or metabitmaps should be used to
lazily read/write persistent bitmaps.  That way only small portions need
to be read/written at a time.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] qemu: command line option for dirty bitmaps
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 7/8] qemu: command line option " Vladimir Sementsov-Ogievskiy
@ 2015-06-11 20:57   ` John Snow
  2015-06-12 21:49   ` John Snow
  1 sibling, 0 replies; 76+ messages in thread
From: John Snow @ 2015-06-11 20:57 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> The patch adds the following command line option:
> 
> -dirty-bitmap [option1=val1][,option2=val2]...
>     Available options are:
>     name         The name for the bitmap (necessary).
> 
>     file         The file to load the bitmap from.
> 
>     file_id      When specified with 'file' option, then this file will
>                  be available through this id for other -dirty-bitmap
>                  options when specified without 'file' option, then it
>                  is a reference to 'file', specified with another
>                  -dirty-bitmap option, and it will be used to load the
>                  bitmap from.
> 
>     drive        The drive to bind the bitmap to. It should be specified
>                  as 'id' suboption of one of -drive options. If nor
>                  'file' neither 'file_id' are specified, then the bitmap
>                  will be loaded from that drive (internal dirty bitmap).
> 
>     granularity  The granularity for the bitmap. Not necessary, the
>                  default value may be used.
> 
>     enabled      on|off. Default is 'on'. Disabled bitmaps are not
>                  changing regardless of writes to corresponding drive.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  blockdev.c                |  38 ++++++++++++++++++
>  include/sysemu/blockdev.h |   1 +
>  include/sysemu/sysemu.h   |   1 +
>  qemu-options.hx           |  37 +++++++++++++++++
>  vl.c                      | 100 ++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 177 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 5eaf77e..2a74395 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -176,6 +176,11 @@ QemuOpts *drive_def(const char *optstr)
>      return qemu_opts_parse(qemu_find_opts("drive"), optstr, 0);
>  }
>  
> +QemuOpts *dirty_bitmap_def(const char *optstr)
> +{
> +    return qemu_opts_parse(qemu_find_opts("dirty-bitmap"), optstr, 0);
> +}
> +
>  QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
>                      const char *optstr)
>  {
> @@ -3093,6 +3098,39 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>      return head;
>  }
>  
> +QemuOptsList qemu_dirty_bitmap_opts = {
> +    .name = "dirty-bitmap",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dirty_bitmap_opts.head),
> +    .desc = {
> +        {
> +            .name = "name",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Name of the dirty bitmap",
> +        },{
> +            .name = "file",
> +            .type = QEMU_OPT_STRING,
> +            .help = "file name to load the bitmap from",
> +        },{
> +            .name = "file_id",
> +            .type = QEMU_OPT_STRING,
> +            .help = "node name to load the bitmap from (or to set id for"
> +                    " for file, opened by previous option)",
> +        },{
> +            .name = "drive",
> +            .type = QEMU_OPT_STRING,
> +            .help = "drive id to bind the bitmap to",
> +        },{
> +            .name = "granularity",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "granularity",
> +        },{
> +            .name = "enabled",
> +            .type = QEMU_OPT_BOOL,
> +            .help = "enabled flag (default is 'on')",
> +        }
> +    }
> +};
> +
>  QemuOptsList qemu_common_drive_opts = {
>      .name = "drive",
>      .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
> diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
> index 7ca59b5..5b101b8 100644
> --- a/include/sysemu/blockdev.h
> +++ b/include/sysemu/blockdev.h
> @@ -57,6 +57,7 @@ int drive_get_max_devs(BlockInterfaceType type);
>  DriveInfo *drive_get_next(BlockInterfaceType type);
>  
>  QemuOpts *drive_def(const char *optstr);
> +QemuOpts *dirty_bitmap_def(const char *optstr);
>  QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
>                      const char *optstr);
>  DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type);
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 8a52934..681a8f3 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -207,6 +207,7 @@ bool usb_enabled(void);
>  
>  extern QemuOptsList qemu_legacy_drive_opts;
>  extern QemuOptsList qemu_common_drive_opts;
> +extern QemuOptsList qemu_dirty_bitmap_opts;
>  extern QemuOptsList qemu_drive_opts;
>  extern QemuOptsList qemu_chardev_opts;
>  extern QemuOptsList qemu_device_opts;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index ec356f6..5e93122 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -614,6 +614,43 @@ qemu-system-i386 -hda a -hdb b
>  @end example
>  ETEXI
>  
> +DEF("dirty-bitmap", HAS_ARG, QEMU_OPTION_dirty_bitmap,
> +    "-dirty-bitmap name=name[,file=file][,file_id=file_id][,drive=@var{id}]\n"
> +    "              [,granularity=granularity][,enabled=on|off]\n",
> +    QEMU_ARCH_ALL)
> +STEXI
> +@item -dirty-bitmap @var{option}[,@var{option}[,@var{option}[,...]]]
> +@findex -dirty-bitmap
> +
> +Define a dirty-bitmap. Valid options are:
> +
> +@table @option
> +@item name=@var{name}
> +The name of the bitmap. Should be unique per @var{file}/@var{drive} and per
> +@var{for_drive}.
> +@item file=@var{file}
> +The separate qcow2 file for loading the bitmap @var{name} from it.
> +@item file_id=@var{file_id}
> +When specified with @var{file} option, then this @var{file} will be available
> +through this @var{file_id} for other @option{-dirty-bitmap} options.
> +When specified without @var{file} option, then it is a reference to @var{file},
> +specified with another @option{-dirty-bitmap} option, and it will be used to
> +load the bitmap from.
> +@item drive=@var{drive}
> +The drive to bind the bitmap to. It should be specified as @var{id} suboption
> +of one of @option{-drive} options.
> +If nor @var{file} neither @var{file_id} are specified, then the bitmap will be
> +loaded from that drive (internal dirty bitmap).
> +@item granularity=@var{granularity}
> +Granularity (in bytes) for created dirty bitmap. If the bitmap is already
> +exists in specified @var{file}/@var{file_id}/@var{device} it's granularity will
> +not be changed but only checked (an error will be generated if this check
> +fails).
> +@item enabled=@var{enabled}
> +Enabled flag for the bitmap. By default the bitmap will be enabled.
> +@end table
> +ETEXI
> +
>  DEF("mtdblock", HAS_ARG, QEMU_OPTION_mtdblock,
>      "-mtdblock file  use 'file' as on-board Flash memory image\n",
>      QEMU_ARCH_ALL)
> diff --git a/vl.c b/vl.c
> index 83871f5..fb16d0c 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1091,6 +1091,95 @@ static int cleanup_add_fd(QemuOpts *opts, void *opaque)
>  #define MTD_OPTS ""
>  #define SD_OPTS ""
>  
> +static int dirty_bitmap_func(QemuOpts *opts, void *opaque)
> +{
> +    Error *local_err = NULL;
> +    Error **errp = &local_err;
> +    BlockDriverState *file_bs = NULL, *for_bs = NULL;
> +    BdrvDirtyBitmap *bitmap = NULL;
> +
> +    const char *name = qemu_opt_get(opts, "name");
> +    const char *drive = qemu_opt_get(opts, "drive");
> +    const char *file = qemu_opt_get(opts, "file");
> +    const char *file_id = qemu_opt_get(opts, "file_id");
> +
> +    uint64_t granularity = qemu_opt_get_number(opts, "granularity", 0);
> +    bool enabled = qemu_opt_get_bool(opts, "enabled", true);
> +
> +    if (name == NULL) {
> +        error_setg(errp, "'name' option is necessary");
> +        goto fail;
> +    }
> +
> +    if (drive == NULL) {
> +        error_setg(errp, "'drive' option is necessary");
> +        goto fail;
> +    }
> +
> +    for_bs = bdrv_lookup_bs(drive, NULL, errp);

I think in this case we can support arbitrary nodes as well as full
drives, since we can create bitmaps for these nodes. If there's another
reason why we couldn't for our first pass at this feature, we should
document it and remember to come back and fix it.

To that end, maybe "node" is better than "drive" for the CLI here, to
match the QMP interface.

> +    if (for_bs == NULL) {
> +        goto fail;
> +    }
> +
> +    if (file != NULL) {
> +        QDict *options = NULL;
> +        if (file_id != NULL) {
> +            options = qdict_new();
> +            qdict_put(options, "node-name", qstring_from_str(file_id));
> +        }
> +
> +        bdrv_open(&file_bs, file, NULL, options, 0, NULL, errp);
> +        if (options) {
> +            QDECREF(options);
> +        }
> +        if (file_bs == NULL) {
> +            goto fail;
> +        }
> +    } else if (file_id != NULL) {
> +        file_bs = bdrv_find_node(file_id);
> +        if (file_bs == NULL) {
> +            error_setg(errp, "node '%s' is not found", drive);
> +            goto fail;
> +        }
> +    } else {
> +        file_bs = for_bs;
> +    }
> +
> +    if (granularity == 0) {
> +        granularity = bdrv_get_default_bitmap_granularity(for_bs);
> +    }
> +
> +    bitmap = bdrv_load_dirty_bitmap(for_bs, file_bs, granularity, name,
> +                                    errp);
> +    if (*errp != NULL) {
> +        goto fail;
> +    }
> +
> +    if (bitmap == NULL) {
> +        /* bitmap is not found in file_bs */
> +        bitmap = bdrv_create_dirty_bitmap(for_bs, granularity, name, errp);
> +        if (!bitmap) {
> +            goto fail;
> +        }
> +    }
> +

I don't think we should auto-create the persistent file unless we are
explicitly trying to do that. I'll elaborate a little more in a larger
summary email for the whole series by the end of tomorrow.

> +    bdrv_dirty_bitmap_set_file(bitmap, file_bs);
> +
> +    if (!enabled) {
> +        bdrv_disable_dirty_bitmap(bitmap);
> +    }
> +
> +    return 0;
> +
> +fail:
> +    error_report("-dirty-bitmap: %s", error_get_pretty(local_err));
> +    error_free(local_err);
> +    if (file_bs != NULL) {
> +        bdrv_close(file_bs);
> +    }
> +    return -1;
> +}
> +
>  static int drive_init_func(QemuOpts *opts, void *opaque)
>  {
>      BlockInterfaceType *block_default_type = opaque;
> @@ -2790,6 +2879,7 @@ int main(int argc, char **argv, char **envp)
>      module_call_init(MODULE_INIT_QOM);
>  
>      qemu_add_opts(&qemu_drive_opts);
> +    qemu_add_opts(&qemu_dirty_bitmap_opts);
>      qemu_add_drive_opts(&qemu_legacy_drive_opts);
>      qemu_add_drive_opts(&qemu_common_drive_opts);
>      qemu_add_drive_opts(&qemu_drive_opts);
> @@ -2918,6 +3008,11 @@ int main(int argc, char **argv, char **envp)
>                      exit(1);
>                  }
>                  break;
> +            case QEMU_OPTION_dirty_bitmap:
> +                if (dirty_bitmap_def(optarg) == NULL) {
> +                    exit(1);
> +                }
> +                break;
>              case QEMU_OPTION_set:
>                  if (qemu_set_option(optarg) != 0)
>                      exit(1);
> @@ -4198,6 +4293,11 @@ int main(int argc, char **argv, char **envp)
>  
>      parse_numa_opts(machine_class);
>  
> +    if (qemu_opts_foreach(qemu_find_opts("dirty-bitmap"), dirty_bitmap_func,
> +                          NULL, 1) != 0) {
> +        exit(1);
> +    }
> +
>      if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
>          exit(1);
>      }
> 

Seems okay overall, but I don't like the idea of needing to specify the
granularity to fetch a bitmap out of a file.

I suppose this parameter existed because of the overload with the desire
to CREATE a persistent bitmap at startup, where you would need to
specify a granularity, and perhaps that's fine.

I see that if you leave it un-set it defaults to 0 and we call the
default granularity fn() to get 64K, but we'll still fail to find the
bitmap if the granularity in file is not the default.

I'd like to see the granularity not play a role in the lookup at all.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature Vladimir Sementsov-Ogievskiy
  2015-06-09 16:52   ` Stefan Hajnoczi
  2015-06-10 14:30   ` Stefan Hajnoczi
@ 2015-06-11 23:04   ` John Snow
  2015-06-15 14:05     ` Vladimir Sementsov-Ogievskiy
  2015-06-12 21:55   ` John Snow
  2015-08-27 12:43   ` Vladimir Sementsov-Ogievskiy
  4 siblings, 1 reply; 76+ messages in thread
From: John Snow @ 2015-06-11 23:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> Adds dirty-bitmaps feature to qcow2 format as specified in
> docs/specs/qcow2.txt
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/Makefile.objs        |   2 +-
>  block/qcow2-dirty-bitmap.c | 503 +++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.c              |  56 +++++
>  block/qcow2.h              |  50 +++++
>  include/block/block_int.h  |  10 +
>  5 files changed, 620 insertions(+), 1 deletion(-)
>  create mode 100644 block/qcow2-dirty-bitmap.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 0d8c2a4..bff12b4 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,5 +1,5 @@
>  block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o qcow2-dirty-bitmap.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
> new file mode 100644
> index 0000000..bc0167c
> --- /dev/null
> +++ b/block/qcow2-dirty-bitmap.c
> @@ -0,0 +1,503 @@
> +/*
> + * Dirty bitmpas for the QCOW version 2 format
> + *
> + * Copyright (c) 2014-2015 Vladimir Sementsov-Ogievskiy
> + *
> + * This file is derived from qcow2-snapshot.c, original copyright:
> + * Copyright (c) 2004-2006 Fabrice Bellard
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "block/qcow2.h"
> +
> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        g_free(s->dirty_bitmaps[i].name);
> +    }
> +    g_free(s->dirty_bitmaps);
> +    s->dirty_bitmaps = NULL;
> +    s->nb_dirty_bitmaps = 0;
> +}
> +
> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmapHeader h;
> +    QCowDirtyBitmap *bm;
> +    int i, name_size;
> +    int64_t offset;
> +    int ret;
> +
> +    if (!s->nb_dirty_bitmaps) {
> +        s->dirty_bitmaps = NULL;
> +        s->dirty_bitmaps_size = 0;
> +        return 0;
> +    }
> +
> +    offset = s->dirty_bitmaps_offset;
> +    s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        /* Read statically sized part of the dirty_bitmap header */
> +        offset = align_offset(offset, 8);
> +        ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +
> +        offset += sizeof(h);
> +        bm = s->dirty_bitmaps + i;
> +        bm->l1_table_offset = be64_to_cpu(h.l1_table_offset);
> +        bm->l1_size = be32_to_cpu(h.l1_size);
> +        bm->bitmap_granularity = be32_to_cpu(h.bitmap_granularity);
> +        bm->bitmap_size = be64_to_cpu(h.bitmap_size);
> +
> +        name_size = be16_to_cpu(h.name_size);
> +
> +        /* Read dirty_bitmap name */
> +        bm->name = g_malloc(name_size + 1);
> +        ret = bdrv_pread(bs->file, offset, bm->name, name_size);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += name_size;
> +        bm->name[name_size] = '\0';
> +
> +        if (offset - s->dirty_bitmaps_offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
> +            ret = -EFBIG;
> +            goto fail;
> +        }
> +    }
> +
> +    assert(offset - s->dirty_bitmaps_offset <= INT_MAX);
> +    s->dirty_bitmaps_size = offset - s->dirty_bitmaps_offset;
> +    return 0;
> +
> +fail:
> +    qcow2_free_dirty_bitmaps(bs);
> +    return ret;
> +}
> +
> +/* Add at the end of the file a new table of dirty bitmaps */
> +static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap *bm;
> +    QCowDirtyBitmapHeader h;
> +    int i, name_size, dirty_bitmaps_size;
> +    int64_t offset, dirty_bitmaps_offset = 0;
> +    int ret;
> +
> +    int old_dirty_bitmaps_size = s->dirty_bitmaps_size;
> +    int64_t old_dirty_bitmaps_offset = s->dirty_bitmaps_offset;
> +
> +    /* Compute the size of the dirty bitmaps table */
> +    offset = 0;
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        bm = s->dirty_bitmaps + i;
> +        offset = align_offset(offset, 8);
> +        offset += sizeof(h);
> +        offset += strlen(bm->name);
> +
> +        if (offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
> +            ret = -EFBIG;
> +            goto fail;
> +        }
> +    }
> +
> +    assert(offset <= INT_MAX);
> +    dirty_bitmaps_size = offset;
> +
> +    /* Allocate space for the new dirty bitmap table */
> +    dirty_bitmaps_offset = qcow2_alloc_clusters(bs, dirty_bitmaps_size);
> +    offset = dirty_bitmaps_offset;
> +    if (offset < 0) {
> +        ret = offset;
> +        goto fail;
> +    }
> +    ret = bdrv_flush(bs);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    /* The dirty bitmap table position has not yet been updated, so these
> +     * clusters must indeed be completely free */
> +    ret = qcow2_pre_write_overlap_check(bs, 0, offset, dirty_bitmaps_size);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    /* Write all dirty bitmaps to the new table */
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        bm = s->dirty_bitmaps + i;
> +        memset(&h, 0, sizeof(h));
> +        h.l1_table_offset = cpu_to_be64(bm->l1_table_offset);
> +        h.l1_size = cpu_to_be32(bm->l1_size);
> +        h.bitmap_granularity = cpu_to_be32(bm->bitmap_granularity);
> +        h.bitmap_size = cpu_to_be64(bm->bitmap_size);
> +
> +        name_size = strlen(bm->name);
> +        assert(name_size <= UINT16_MAX);
> +        h.name_size = cpu_to_be16(name_size);
> +        offset = align_offset(offset, 8);
> +
> +        ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += sizeof(h);
> +
> +        ret = bdrv_pwrite(bs->file, offset, bm->name, name_size);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += name_size;
> +    }
> +
> +    /*
> +     * Update the header extension to point to the new dirty bitmap table. This
> +     * requires the new table and its refcounts to be stable on disk.
> +     */
> +    ret = bdrv_flush(bs);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    s->dirty_bitmaps_offset = dirty_bitmaps_offset;
> +    s->dirty_bitmaps_size = dirty_bitmaps_size;
> +    ret = qcow2_update_header(bs);
> +    if (ret < 0) {
> +        fprintf(stderr, "Could not update qcow2 header\n");
> +        goto fail;
> +    }
> +
> +    /* Free old dirty bitmap table */
> +    qcow2_free_clusters(bs, old_dirty_bitmaps_offset, old_dirty_bitmaps_size,
> +                        QCOW2_DISCARD_ALWAYS);
> +    return 0;
> +
> +fail:
> +    if (dirty_bitmaps_offset > 0) {
> +        qcow2_free_clusters(bs, dirty_bitmaps_offset, dirty_bitmaps_size,
> +                            QCOW2_DISCARD_ALWAYS);
> +    }
> +    return ret;
> +}
> +
> +static int find_dirty_bitmap_by_name(BlockDriverState *bs,
> +                                     const char *name)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        if (!strcmp(s->dirty_bitmaps[i].name, name)) {
> +            return i;
> +        }
> +    }
> +
> +    return -1;
> +}
> +
> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
> +                            const char *name, uint64_t size,
> +                            int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i, dirty_bitmap_index, ret;
> +    uint64_t offset;
> +    QCowDirtyBitmap *bm;
> +    uint64_t *l1_table;
> +    uint8_t *buf;
> +
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index < 0) {
> +        return NULL;
> +    }
> +    bm = &s->dirty_bitmaps[dirty_bitmap_index];
> +
> +    if (size != bm->bitmap_size || granularity != bm->bitmap_granularity) {
> +        return NULL;
> +    }
> +
> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    buf = g_malloc0(bm->l1_size * s->cluster_size);
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        offset = be64_to_cpu(l1_table[i]);
> +        if (!(offset & 1)) {
> +            ret = bdrv_pread(bs->file, offset, buf + i * s->cluster_size,
> +                             s->cluster_size);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +    }
> +
> +    g_free(l1_table);
> +    return buf;
> +
> +fail:
> +    g_free(l1_table);
> +    return NULL;
> +}
> +
> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
> +                            const char *name, uint64_t size,
> +                            int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int cl_size = s->cluster_size;
> +    int i, dirty_bitmap_index, ret = 0, n;
> +    uint64_t *l1_table;
> +    QCowDirtyBitmap *bm;
> +    uint64_t buf_size;
> +    uint8_t *p;
> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
> +
> +    /* find/create dirty bitmap */
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index >= 0) {
> +        bm = s->dirty_bitmaps + dirty_bitmap_index;
> +
> +        if (size != bm->bitmap_size ||
> +            granularity != bm->bitmap_granularity) {
> +            qcow2_dirty_bitmap_delete(bs, name, NULL);
> +            dirty_bitmap_index = -1;
> +        }
> +    }
> +    if (dirty_bitmap_index < 0) {
> +        qcow2_dirty_bitmap_create(bs, name, size, granularity);
> +        dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
> +    }
> +    bm = s->dirty_bitmaps + dirty_bitmap_index;

I catch a segfault right around here if I do the following:

./x86_64-softmmu/qemu-system-x86_64 --dirty-bitmap
file=bitmaps.qcow2,name=bitmap0,drive=drive0 -drive
if=none,file=hda.qcow2,id=drive0 -device ide-hd,drive=drive0

hda.qcow2 and bitmaps.qcow2 are both empty files, but bitmaps.qcow2 has
a size of '0'.

Then when I click close in the QEMU GTK frontend, we hit a segfault when
trying to close because s->dirty_bitmaps is NULL, because it appears as
if we've never actually tried to add the (empty) bitmap to the (empty) file.

Your iotest works, but I am not actually sure why, because I don't
actually know how to *create* a persistent bitmap. I thought that the
-dirty-bitmap CLI would create one in the file specified with file=, but
it apparently only creates an in-memory bitmap and sets the file
pointer, but never initializes any of these structures. Then, when we go
to close, it gets confused and everything breaks a bit.

> +
> +    /* read l1 table */
> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto finish;
> +    }
> +
> +    buf_size = (((size - 1) / sector_granularity) >> 3) + 1;
> +    buf_size = align_offset(buf_size, 4);
> +    n = buf_size / cl_size;
> +    p = buf;
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        uint64_t addr = be64_to_cpu(l1_table[i]) & ~511;
> +        int write_size = (i == n ? (buf_size % cl_size) : cl_size);
> +
> +        if (buffer_is_zero(p, write_size)) {
> +            if (addr) {
> +                qcow2_free_clusters(bs, addr, cl_size,
> +                                    QCOW2_DISCARD_ALWAYS);
> +            }
> +            l1_table[i] = cpu_to_be64(1);
> +        } else {
> +            if (!addr) {
> +                addr = qcow2_alloc_clusters(bs, cl_size);
> +                l1_table[i] = cpu_to_be64(addr);
> +            }
> +
> +            ret = bdrv_pwrite(bs->file, addr, p, write_size);
> +            if (ret < 0) {
> +                goto finish;
> +            }
> +        }
> +
> +        p += cl_size;
> +    }
> +
> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
> +                      bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto finish;
> +    }
> +
> +finish:
> +    g_free(l1_table);
> +    return ret;
> +}
> +/* if no id is provided, a new one is constructed */
> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
> +                              uint64_t size, int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap *new_dirty_bitmap_list = NULL;
> +    QCowDirtyBitmap *old_dirty_bitmap_list = NULL;
> +    QCowDirtyBitmap sn1, *bm = &sn1;
> +    int i, ret;
> +    uint64_t *l1_table = NULL;
> +    int64_t l1_table_offset;
> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
> +
> +    if (s->nb_dirty_bitmaps >= QCOW_MAX_DIRTY_BITMAPS) {
> +        return -EFBIG;
> +    }
> +
> +    memset(bm, 0, sizeof(*bm));
> +
> +    /* Check that the ID is unique */
> +    if (find_dirty_bitmap_by_name(bs, name) >= 0) {
> +        return -EEXIST;
> +    }
> +
> +    /* Populate bm with passed data */
> +    bm->name = g_strdup(name);
> +    bm->bitmap_granularity = granularity;
> +    bm->bitmap_size = size;
> +
> +    bm->l1_size =
> +        size_to_clusters(s, (((size - 1) / sector_granularity) >> 3) + 1);
> +    l1_table_offset =
> +        qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
> +    if (l1_table_offset < 0) {
> +        ret = l1_table_offset;
> +        goto fail;
> +    }
> +    bm->l1_table_offset = l1_table_offset;
> +
> +    l1_table = g_try_new(uint64_t, bm->l1_size);
> +    if (l1_table == NULL) {
> +        ret = -ENOMEM;
> +        goto fail;
> +    }
> +
> +    /* initialize with zero clusters */
> +    for (i = 0; i < s->l1_size; i++) {
> +        l1_table[i] = cpu_to_be64(1);

bm->l1_size here in my crash output is just "1",
but s->l1_size is 16, so we crash all over this array.

I assume you meant bm->l1_size here. This is a good case to make against
calling everything "L1."

> +    }
> +
> +    ret = qcow2_pre_write_overlap_check(bs, 0, bm->l1_table_offset,
> +                                        s->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
> +                      s->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    g_free(l1_table);

I can also catch a segfault here by doing something like this:

./x86_64-softmmu/qemu-system-x86_64 -drive
if=none,format=qcow2,cache=writethrough,file=hda.qcow2,id=drive0
-dirty-bitmap name=bitmap,drive=drive0

Trying to mimick your iotest which does not use an external bitmap file
-- it uses the implicit self-storage.

In this case, hda.qcow2 is still empty (but sized as 8GB) and I try to
quit before any writes occur.

freeing l1_table here causes memory corruption and even valgrind goes
down in flames:

==13284== Invalid write of size 8
==13284==    at 0x53A15E: qcow2_dirty_bitmap_create
(qcow2-dirty-bitmap.c:406)
==13284==    by 0x539D15: qcow2_dirty_bitmap_store
(qcow2-dirty-bitmap.c:307)
==13284==    by 0x505F27: bdrv_store_dirty_bitmap (block.c:3176)
==13284==    by 0x50306D: bdrv_close (block.c:1739)
==13284==    by 0x5032FF: bdrv_close_all (block.c:1797)
==13284==    by 0x3049DC: main (vl.c:4577)
==13284==  Address 0x239b7978 is 0 bytes after a block of size 8 alloc'd
==13284==    at 0x4A06BCF: malloc (vg_replace_malloc.c:296)
==13284==    by 0x300111: malloc_and_trace (vl.c:2706)
==13284==    by 0x62B954E: g_try_malloc (gmem.c:242)
==13284==    by 0x53A11E: qcow2_dirty_bitmap_create
(qcow2-dirty-bitmap.c:398)
==13284==    by 0x539D15: qcow2_dirty_bitmap_store
(qcow2-dirty-bitmap.c:307)
==13284==    by 0x505F27: bdrv_store_dirty_bitmap (block.c:3176)
==13284==    by 0x50306D: bdrv_close (block.c:1739)
==13284==    by 0x5032FF: bdrv_close_all (block.c:1797)
==13284==    by 0x3049DC: main (vl.c:4577)
==13284==
--13284-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11
(SIGSEGV) - exiting
--13284-- si_code=80;  Faulting address: 0x0;  sp: 0x8090a1de0

valgrind: the 'impossible' happened:
   Killed by fatal signal

> +    l1_table = NULL;
> +
> +    /* Append the new dirty bitmap to the dirty bitmap list */
> +    new_dirty_bitmap_list = g_new(QCowDirtyBitmap, s->nb_dirty_bitmaps + 1);
> +    if (s->dirty_bitmaps) {
> +        memcpy(new_dirty_bitmap_list, s->dirty_bitmaps,
> +               s->nb_dirty_bitmaps * sizeof(QCowDirtyBitmap));
> +        old_dirty_bitmap_list = s->dirty_bitmaps;
> +    }
> +    s->dirty_bitmaps = new_dirty_bitmap_list;
> +    s->dirty_bitmaps[s->nb_dirty_bitmaps++] = *bm;
> +
> +    ret = qcow2_write_dirty_bitmaps(bs);
> +    if (ret < 0) {
> +        g_free(s->dirty_bitmaps);
> +        s->dirty_bitmaps = old_dirty_bitmap_list;
> +        s->nb_dirty_bitmaps--;
> +        goto fail;
> +    }
> +
> +    g_free(old_dirty_bitmap_list);
> +
> +    return 0;
> +
Disk is 8GiB, 16,777,216 sectors, and bm->bitmap_size matches that.
> +fail:
> +    g_free(bm->name);
> +    g_free(l1_table);
> +
> +    return ret;
> +}
> +
> +static int qcow2_dirty_bitmap_free_clusters(BlockDriverState *bs,
> +                                            QCowDirtyBitmap *bm)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int ret, i;
> +    uint64_t *l1_table = g_new(uint64_t, bm->l1_size);
> +
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        g_free(l1_table);
> +        return ret;
> +    }
> +
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        uint64_t addr = be64_to_cpu(l1_table[i]);
> +        qcow2_free_clusters(bs, addr, s->cluster_size, QCOW2_DISCARD_ALWAYS);
> +    }
> +
> +    qcow2_free_clusters(bs, bm->l1_table_offset, bm->l1_size * sizeof(uint64_t),
> +                        QCOW2_DISCARD_ALWAYS);
> +
> +    g_free(l1_table);
> +    return 0;
> +}
> +
> +int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
> +                              const char *name,
> +                              Error **errp)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap bm;
> +    int dirty_bitmap_index, ret = 0;
> +
> +    /* Search the dirty_bitmap */
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index < 0) {
> +        error_setg(errp, "Can't find the dirty bitmap");
> +        return -ENOENT;
> +    }
> +    bm = s->dirty_bitmaps[dirty_bitmap_index];
> +
> +    /* Remove it from the dirty_bitmap list */
> +    memmove(s->dirty_bitmaps + dirty_bitmap_index,
> +            s->dirty_bitmaps + dirty_bitmap_index + 1,
> +            (s->nb_dirty_bitmaps - dirty_bitmap_index - 1) * sizeof(bm));
> +    s->nb_dirty_bitmaps--;
> +    ret = qcow2_write_dirty_bitmaps(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret,
> +                         "Failed to remove dirty bitmap"
> +                         " from dirty bitmap list");
> +        return ret;
> +    }
> +
> +    qcow2_dirty_bitmap_free_clusters(bs, &bm);
> +    g_free(bm.name);
> +
> +    return ret;
> +}
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b9a72e3..406e55d 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -61,6 +61,7 @@ typedef struct {
>  #define  QCOW2_EXT_MAGIC_END 0
>  #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
>  #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
> +#define  QCOW2_EXT_MAGIC_DIRTY_BITMAPS 0x23852875
>  
>  static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
>  {
> @@ -90,6 +91,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>      QCowExtension ext;
>      uint64_t offset;
>      int ret;
> +    Qcow2DirtyBitmapHeaderExt dirty_bitmaps_ext;
>  
>  #ifdef DEBUG_EXT
>      printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
> @@ -160,6 +162,33 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>              }
>              break;
>  
> +        case QCOW2_EXT_MAGIC_DIRTY_BITMAPS:
> +            ret = bdrv_pread(bs->file, offset, &dirty_bitmaps_ext, ext.len);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "ERROR: dirty_bitmaps_ext: "
> +                                 "Could not read ext header");
> +                return ret;
> +            }
> +
> +            be64_to_cpus(&dirty_bitmaps_ext.dirty_bitmaps_offset);
> +            be32_to_cpus(&dirty_bitmaps_ext.nb_dirty_bitmaps);
> +
> +            s->dirty_bitmaps_offset = dirty_bitmaps_ext.dirty_bitmaps_offset;
> +            s->nb_dirty_bitmaps = dirty_bitmaps_ext.nb_dirty_bitmaps;
> +
> +            ret = qcow2_read_dirty_bitmaps(bs);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Could not read dirty bitmaps");
> +                return ret;
> +            }
> +
> +#ifdef DEBUG_EXT
> +            printf("Qcow2: Got dirty bitmaps extension:"
> +                   " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
> +                   s->dirty_bitmaps_offset, s->nb_dirty_bitmaps);
> +#endif
> +            break;
> +
>          default:
>              /* unknown magic - save it in case we need to rewrite the header */
>              {
> @@ -1000,6 +1029,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>      g_free(s->unknown_header_fields);
>      cleanup_unknown_header_ext(bs);
>      qcow2_free_snapshots(bs);
> +    qcow2_free_dirty_bitmaps(bs);
>      qcow2_refcount_close(bs);
>      qemu_vfree(s->l1_table);
>      /* else pre-write overlap checks in cache_destroy may crash */
> @@ -1466,6 +1496,7 @@ static void qcow2_close(BlockDriverState *bs)
>      qemu_vfree(s->cluster_data);
>      qcow2_refcount_close(bs);
>      qcow2_free_snapshots(bs);
> +    qcow2_free_dirty_bitmaps(bs);
>  }
>  
>  static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
> @@ -1667,6 +1698,21 @@ int qcow2_update_header(BlockDriverState *bs)
>      buf += ret;
>      buflen -= ret;
>  
> +    if (s->nb_dirty_bitmaps > 0) {
> +        Qcow2DirtyBitmapHeaderExt dirty_bitmaps_header = {
> +            .nb_dirty_bitmaps = cpu_to_be32(s->nb_dirty_bitmaps),
> +            .dirty_bitmaps_offset = cpu_to_be64(s->dirty_bitmaps_offset)
> +        };
> +        ret = header_ext_add(buf, QCOW2_EXT_MAGIC_DIRTY_BITMAPS,
> +                             &dirty_bitmaps_header, sizeof(dirty_bitmaps_header),
> +                             buflen);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        buf += ret;
> +        buflen -= ret;
> +    }
> +
>      /* Keep unknown header extensions */
>      QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
>          ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
> @@ -2176,6 +2222,12 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
>          return -ENOTSUP;
>      }
>  
> +    /* cannot proceed if image has dirty_bitmaps */
> +    if (s->nb_dirty_bitmaps) {
> +        error_report("Can't resize an image which has dirty bitmaps");
> +        return -ENOTSUP;
> +    }
> +
>      /* shrinking is currently not supported */
>      if (offset < bs->total_sectors * 512) {
>          error_report("qcow2 doesn't support shrinking images yet");
> @@ -2952,6 +3004,10 @@ BlockDriver bdrv_qcow2 = {
>      .bdrv_get_info          = qcow2_get_info,
>      .bdrv_get_specific_info = qcow2_get_specific_info,
>  
> +    .bdrv_dirty_bitmap_load = qcow2_dirty_bitmap_load,
> +    .bdrv_dirty_bitmap_store = qcow2_dirty_bitmap_store,
> +    .bdrv_dirty_bitmap_delete = qcow2_dirty_bitmap_delete,
> +
>      .bdrv_save_vmstate    = qcow2_save_vmstate,
>      .bdrv_load_vmstate    = qcow2_load_vmstate,
>  
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 422b825..24beee0 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -39,6 +39,7 @@
>  
>  #define QCOW_MAX_CRYPT_CLUSTERS 32
>  #define QCOW_MAX_SNAPSHOTS 65536
> +#define QCOW_MAX_DIRTY_BITMAPS 65536
>  
>  /* 8 MB refcount table is enough for 2 PB images at 64k cluster size
>   * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
> @@ -52,6 +53,8 @@
>   * space for snapshot names and IDs */
>  #define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
>  
> +#define QCOW_MAX_DIRTY_BITMAPS_SIZE (1024 * QCOW_MAX_DIRTY_BITMAPS)
> +
>  /* indicate that the refcount of the referenced cluster is exactly one. */
>  #define QCOW_OFLAG_COPIED     (1ULL << 63)
>  /* indicate that the cluster is compressed (they never have the copied flag) */
> @@ -138,6 +141,19 @@ typedef struct QEMU_PACKED QCowSnapshotHeader {
>      /* name follows  */
>  } QCowSnapshotHeader;
>  
> +typedef struct QEMU_PACKED QCowDirtyBitmapHeader {
> +    /* header is 8 byte aligned */
> +    uint64_t l1_table_offset;
> +
> +    uint32_t l1_size;
> +    uint32_t bitmap_granularity;
> +
> +    uint64_t bitmap_size;
> +    uint16_t name_size;
> +
> +    /* name follows  */
> +} QCowDirtyBitmapHeader;
> +
>  typedef struct QEMU_PACKED QCowSnapshotExtraData {
>      uint64_t vm_state_size_large;
>      uint64_t disk_size;
> @@ -156,6 +172,14 @@ typedef struct QCowSnapshot {
>      uint64_t vm_clock_nsec;
>  } QCowSnapshot;
>  
> +typedef struct QCowDirtyBitmap {
> +    uint64_t l1_table_offset;
> +    uint32_t l1_size;
> +    char *name;
> +    int bitmap_granularity;
> +    uint64_t bitmap_size;
> +} QCowDirtyBitmap;
> +
>  struct Qcow2Cache;
>  typedef struct Qcow2Cache Qcow2Cache;
>  
> @@ -218,6 +242,11 @@ typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>  typedef void Qcow2SetRefcountFunc(void *refcount_array,
>                                    uint64_t index, uint64_t value);
>  
> +typedef struct Qcow2DirtyBitmapHeaderExt {
> +    uint32_t nb_dirty_bitmaps;
> +    uint64_t dirty_bitmaps_offset;
> +} QEMU_PACKED Qcow2DirtyBitmapHeaderExt;
> +
>  typedef struct BDRVQcowState {
>      int cluster_bits;
>      int cluster_size;
> @@ -259,6 +288,11 @@ typedef struct BDRVQcowState {
>      unsigned int nb_snapshots;
>      QCowSnapshot *snapshots;
>  
> +    uint64_t dirty_bitmaps_offset;
> +    int dirty_bitmaps_size;
> +    unsigned int nb_dirty_bitmaps;
> +    QCowDirtyBitmap *dirty_bitmaps;
> +
>      int flags;
>      int qcow_version;
>      bool use_lazy_refcounts;
> @@ -570,6 +604,22 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
>  void qcow2_free_snapshots(BlockDriverState *bs);
>  int qcow2_read_snapshots(BlockDriverState *bs);
>  
> +/* qcow2-dirty-bitmap.c functions */
> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
> +                             const char *name, uint64_t size,
> +                             int granularity);
> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
> +                                 const char *name, uint64_t size,
> +                                 int granularity);
> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
> +                              uint64_t size, int granularity);
> +int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
> +                              const char *name,
> +                              Error **errp);
> +
> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs);
> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs);
> +
>  /* qcow2-cache.c functions */
>  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
>  int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index db29b74..88855b4 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -206,6 +206,16 @@ struct BlockDriver {
>      int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
>      ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
>  
> +    int (*bdrv_dirty_bitmap_store)(BlockDriverState *bs, uint8_t *buf,
> +                                   const char *name, uint64_t size,
> +                                   int granularity);
> +    uint8_t *(*bdrv_dirty_bitmap_load)(BlockDriverState *bs,
> +                                       const char *name, uint64_t size,
> +                                       int granularity);
> +    int (*bdrv_dirty_bitmap_delete)(BlockDriverState *bs,
> +                                    const char *name,
> +                                    Error **errp);
> +
>      int (*bdrv_save_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
>                               int64_t pos);
>      int (*bdrv_load_vmstate)(BlockDriverState *bs, uint8_t *buf,
> 


In light of this, some "sanity" tests that test cases like no writes,
empty bitmaps, empty files, etc I think will be appropriate.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-11 16:30       ` John Snow
@ 2015-06-12  8:33         ` Kevin Wolf
  0 siblings, 0 replies; 76+ messages in thread
From: Kevin Wolf @ 2015-06-12  8:33 UTC (permalink / raw)
  To: John Snow
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel,
	Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den

Am 11.06.2015 um 18:30 hat John Snow geschrieben:
> On 06/11/2015 06:25 AM, Vladimir Sementsov-Ogievskiy wrote:
> > On 10.06.2015 18:34, Kevin Wolf wrote:
> >> Am 08.06.2015 um 17:21 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>> +=== Bitmap table ===
> >>> +
> >>> +A directory of all bitmaps is stored in the bitmap table, a
> >>> contiguous area in
> >>> +the image file, whose starting offset and length are given by the
> >>> header fields
> >>> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap
> >>> table have
> >>> +variable length, depending on the length of name and extra data.
> >>> +
> >>> +Bitmap table entry:
> >>> +
> >>> +    Byte 0 -  7:    Offset into the image file at which the L1 table
> >>> for the
> >>> +                    bitmap starts. Must be aligned to a cluster
> >>> boundary.
> >>> +
> >>> +         8 - 11:    Number of entries in the L1 table of the bitmap
> >> Worth using 64 bits here? This can only cover 4 * 512 GB = 2 TB for the
> >> smallest possible cluster size. Though it's 65536 * 512 = 32 PB for the
> >> default, which might be enough for a while.
> >>
> >>> +        12 - 15:    Bitmap granularity in bytes
> >>> +
> >>> +        16 - 23:    Bitmap size in sectors
> >> Please don't use sectors, that's a meaningless unit. Bytes is better.
> > Just bad description. Actually it is ~ (number of bits in bitmap *
> > granularity), and it is corresponding to number of sectors in the image.
> 
> In defense of this, it does happen to be sectors, but what it /really/
> represents is the virtual addressable range of the bitmap (its 'size'),
> which just-so-happens to be a sector bitmap.

So not the size of the bitmap, but the size of (range in) the image that
is covered by the bitmap?

> We could just remove the word "sectors" entirely, and just flatly call
> it the bitmap size -- but this does reveal the internal nature of the
> block layer, which uses sector bitmaps.
> 
> If you wish, we can rework this field to use bytes and just convert on
> every load/store into the format that we actually require. I suppose
> it'd match the QMP interface in that way.

Internally we can do whatever we want, but what is stored in the image
format can't be changed later on, so it should be kept as generic as
possible.

How about "number of bits in the bitmap" as the unit for the size? And
possibly require that it's a multiple of 8.

Kevin

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-11 20:06 ` Stefan Hajnoczi
@ 2015-06-12  9:58   ` Denis V. Lunev
  2015-06-12 10:36     ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Denis V. Lunev @ 2015-06-12  9:58 UTC (permalink / raw)
  To: Stefan Hajnoczi, Vladimir Sementsov-Ogievskiy
  Cc: kwolf, jsnow, qemu-devel, pbonzini

On 11/06/15 23:06, Stefan Hajnoczi wrote:
> The load/store API is not scalable when bitmaps are 1 MB or larger.
>
> For example, a 500 GB disk image with 64 KB granularity requires a 1 MB
> bitmap.  If a guest has several disk images of this size, then multiple
> megabytes must be read to start the guest and written out to shut down
> the guest.
>
> By comparison, the L1 table for the 500 GB disk image is less than 8 KB.
>
> I think something like qcow2-cache.c or metabitmaps should be used to
> lazily read/write persistent bitmaps.  That way only small portions need
> to be read/written at a time.
>
> Stefan
for the first iteration we could open the image, start tracking,
read bitmap as one entity in the background and or read
and collected data.

partial read could be done in the next step

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-11 16:21             ` John Snow
@ 2015-06-12 10:28               ` Stefan Hajnoczi
  2015-06-12 15:19                 ` John Snow
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-12 10:28 UTC (permalink / raw)
  To: John Snow
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi, qemu-devel,
	Vladimir Sementsov-Ogievskiy, den, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1547 bytes --]

On Thu, Jun 11, 2015 at 12:21:31PM -0400, John Snow wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> 
> 
> On 06/11/2015 09:03 AM, Stefan Hajnoczi wrote:
> > On Thu, Jun 11, 2015 at 01:19:24PM +0300, Vladimir
> > Sementsov-Ogievskiy wrote:
> >> On 10.06.2015 16:24, Stefan Hajnoczi wrote:
> >>> On Wed, Jun 10, 2015 at 11:19:30AM +0300, Vladimir
> >>> Sementsov-Ogievskiy wrote:
> >>>> On 09.06.2015 20:03, Stefan Hajnoczi wrote:
> >>>>> On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir
> >>>>> Sementsov-Ogievskiy wrote:
> >>>>>> @@ -166,6 +167,19 @@ the header extension data. Each
> >>>>>> entry look like this: terminated if it has full length) 
> >>>>>> +== Dirty bitmaps == + +Dirty bitmaps is an optional
> >>>>>> header extension. It provides a possibility of +storing
> >>>>>> dirty bitmaps in qcow2 image. The fields are: + +
> >>>>>> 0 -  3:  nb_dirty_bitmaps +                   Number of
> >>>>>> dirty bitmaps contained in the image
> >>>>> Is there a maximum?
> >>>> hmm. any proposals for this?
> >>> 65535 seems practical.
> >> 
> >> So, you suggest to reduce this field width to 2b? And additional
> >> 2 bytes reserved field, to achieve 8b-alignment?
> > 
> > No, I would leave it 32-bit but impose a little (which can be

s/little/limit/

> > increased later if necessary).  That's how nb_snapshots works too.
> > 
> 
> Doesn't the code already limit the number of bitmaps via +#define
> QCOW_MAX_DIRTY_BITMAPS 65536, from patch 2?

It needs to be in the specification.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-12  9:58   ` Denis V. Lunev
@ 2015-06-12 10:36     ` Stefan Hajnoczi
  2015-08-26  6:26       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-12 10:36 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, jsnow, qemu-devel, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1348 bytes --]

On Fri, Jun 12, 2015 at 12:58:35PM +0300, Denis V. Lunev wrote:
> On 11/06/15 23:06, Stefan Hajnoczi wrote:
> >The load/store API is not scalable when bitmaps are 1 MB or larger.
> >
> >For example, a 500 GB disk image with 64 KB granularity requires a 1 MB
> >bitmap.  If a guest has several disk images of this size, then multiple
> >megabytes must be read to start the guest and written out to shut down
> >the guest.
> >
> >By comparison, the L1 table for the 500 GB disk image is less than 8 KB.
> >
> >I think something like qcow2-cache.c or metabitmaps should be used to
> >lazily read/write persistent bitmaps.  That way only small portions need
> >to be read/written at a time.
> >
> >Stefan
> for the first iteration we could open the image, start tracking,
> read bitmap as one entity in the background and or read
> and collected data.
> 
> partial read could be done in the next step

Making bitmap load/store fully lazy will require changes to the
load/store API, so it's worth thinking about a little upfront.
Otherwise there will be a lot of code churn when the fully lazy patches
are posted.  As a reviewer it's in my interest to only spend time
reviewing the final version instead of code that gets thrown out :-),
but I understand.

If you can make the read lazy to some extent that's a good start.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-12 10:28               ` Stefan Hajnoczi
@ 2015-06-12 15:19                 ` John Snow
  0 siblings, 0 replies; 76+ messages in thread
From: John Snow @ 2015-06-12 15:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi, qemu-devel,
	Vladimir Sementsov-Ogievskiy, den, pbonzini



On 06/12/2015 06:28 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 11, 2015 at 12:21:31PM -0400, John Snow wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>>
>>
>> On 06/11/2015 09:03 AM, Stefan Hajnoczi wrote:
>>> On Thu, Jun 11, 2015 at 01:19:24PM +0300, Vladimir
>>> Sementsov-Ogievskiy wrote:
>>>> On 10.06.2015 16:24, Stefan Hajnoczi wrote:
>>>>> On Wed, Jun 10, 2015 at 11:19:30AM +0300, Vladimir
>>>>> Sementsov-Ogievskiy wrote:
>>>>>> On 09.06.2015 20:03, Stefan Hajnoczi wrote:
>>>>>>> On Mon, Jun 08, 2015 at 06:21:19PM +0300, Vladimir
>>>>>>> Sementsov-Ogievskiy wrote:
>>>>>>>> @@ -166,6 +167,19 @@ the header extension data. Each
>>>>>>>> entry look like this: terminated if it has full length) 
>>>>>>>> +== Dirty bitmaps == + +Dirty bitmaps is an optional
>>>>>>>> header extension. It provides a possibility of +storing
>>>>>>>> dirty bitmaps in qcow2 image. The fields are: + +
>>>>>>>> 0 -  3:  nb_dirty_bitmaps +                   Number of
>>>>>>>> dirty bitmaps contained in the image
>>>>>>> Is there a maximum?
>>>>>> hmm. any proposals for this?
>>>>> 65535 seems practical.
>>>>
>>>> So, you suggest to reduce this field width to 2b? And additional
>>>> 2 bytes reserved field, to achieve 8b-alignment?
>>>
>>> No, I would leave it 32-bit but impose a little (which can be
> 
> s/little/limit/
> 
>>> increased later if necessary).  That's how nb_snapshots works too.
>>>
>>
>> Doesn't the code already limit the number of bitmaps via +#define
>> QCOW_MAX_DIRTY_BITMAPS 65536, from patch 2?
> 
> It needs to be in the specification.
> 

Yes, but the way the replies read made it sound like we hadn't decided
on what the limit *was*, so I was just trying to clarify for myself, here.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-10 14:30   ` Stefan Hajnoczi
@ 2015-06-12 19:02     ` John Snow
  2015-06-15 14:42       ` Stefan Hajnoczi
  2015-08-14 17:14     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 76+ messages in thread
From: John Snow @ 2015-06-12 19:02 UTC (permalink / raw)
  To: Stefan Hajnoczi, Vladimir Sementsov-Ogievskiy
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, qemu-devel, den



On 06/10/2015 10:30 AM, Stefan Hajnoczi wrote:
> On Mon, Jun 08, 2015 at 06:21:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 
> I noticed a corner case, it's probably not a problem in practice:
> 
> Since the dirty bitmap is stored with the help of a BlockDriverState
> (and its bs->file), it's possible that writing the bitmap will cause
> bits in the bitmap to be dirtied!
> 

But since it's metadata and not stored within a disk sector, can this
actually happen? Do you have an example of a scenario where this might
come up?

>> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
>> new file mode 100644
>> index 0000000..bc0167c
>> --- /dev/null
>> +++ b/block/qcow2-dirty-bitmap.c
>> @@ -0,0 +1,503 @@
>> +/*
>> + * Dirty bitmpas for the QCOW version 2 format
> 
> s/bitmpas/bitmaps/
> 
>> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    QCowDirtyBitmapHeader h;
>> +    QCowDirtyBitmap *bm;
>> +    int i, name_size;
>> +    int64_t offset;
>> +    int ret;
>> +
>> +    if (!s->nb_dirty_bitmaps) {
>> +        s->dirty_bitmaps = NULL;
>> +        s->dirty_bitmaps_size = 0;
>> +        return 0;
>> +    }
>> +
>> +    offset = s->dirty_bitmaps_offset;
>> +    s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);
> 
> Please use g_try_new0() and handle the NULL return value.
> g_new/g_malloc abort the process if there is not enough memory.  When
> opening untrusted image files it is possible that large values will be
> encountered and allocations fail.  In that case .bdrv_open() should fail
> instead of killing QEMU.
> 
> Using g_try_*() in QEMU is not an exact science but large data buffers
> or allocations where external inputs influence the size are good
> candidates.
> 
> Other allocations in these patches should do that too.
> 
>> +    /* Allocate space for the new dirty bitmap table */
>> +    dirty_bitmaps_offset = qcow2_alloc_clusters(bs, dirty_bitmaps_size);
>> +    offset = dirty_bitmaps_offset;
>> +    if (offset < 0) {
>> +        ret = offset;
>> +        goto fail;
>> +    }
>> +    ret = bdrv_flush(bs);
> 
> Not sure there is a need for this.  The clusters are inaccessible since
> no metadata points to them yet.  Therefore we don't need to flush yet
> because there is no risk of seeing an inconsistent state.
> 
>> +    /* Write all dirty bitmaps to the new table */
>> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
>> +        bm = s->dirty_bitmaps + i;
>> +        memset(&h, 0, sizeof(h));
>> +        h.l1_table_offset = cpu_to_be64(bm->l1_table_offset);
>> +        h.l1_size = cpu_to_be32(bm->l1_size);
>> +        h.bitmap_granularity = cpu_to_be32(bm->bitmap_granularity);
>> +        h.bitmap_size = cpu_to_be64(bm->bitmap_size);
>> +
>> +        name_size = strlen(bm->name);
>> +        assert(name_size <= UINT16_MAX);
>> +        h.name_size = cpu_to_be16(name_size);
>> +        offset = align_offset(offset, 8);
>> +
>> +        ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +        offset += sizeof(h);
>> +
>> +        ret = bdrv_pwrite(bs->file, offset, bm->name, name_size);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +        offset += name_size;
>> +    }
> 
> If files have many thousands of bitmaps then this loop will be slow.  It
> would be much faster to write out 1 cluster at a time.  This probably
> doesn't matter in practice since this function doesn't get called much
> and normally files will have few bitmaps.
> 
>> +
>> +    /*
>> +     * Update the header extension to point to the new dirty bitmap table. This
>> +     * requires the new table and its refcounts to be stable on disk.
>> +     */
>> +    ret = bdrv_flush(bs);
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    s->dirty_bitmaps_offset = dirty_bitmaps_offset;
>> +    s->dirty_bitmaps_size = dirty_bitmaps_size;
>> +    ret = qcow2_update_header(bs);
>> +    if (ret < 0) {
>> +        fprintf(stderr, "Could not update qcow2 header\n");
>> +        goto fail;
>> +    }
> 
> qcow2_update_header() does not flush.  We need to flush before freeing
> the old clusters in order to guarantee that the file now points to the
> new clusters.
> 
>> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
>> +                            const char *name, uint64_t size,
>> +                            int granularity)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int i, dirty_bitmap_index, ret;
>> +    uint64_t offset;
>> +    QCowDirtyBitmap *bm;
>> +    uint64_t *l1_table;
>> +    uint8_t *buf;
>> +
>> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
>> +    if (dirty_bitmap_index < 0) {
>> +        return NULL;
>> +    }
>> +    bm = &s->dirty_bitmaps[dirty_bitmap_index];
>> +
>> +    if (size != bm->bitmap_size || granularity != bm->bitmap_granularity) {
>> +        return NULL;
>> +    }
>> +
>> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
> 
> Please use g_try_malloc() with NULL handling.
> 
>> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
>> +                     bm->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    buf = g_malloc0(bm->l1_size * s->cluster_size);
> 
> What is the maximum l1_size value?  cluster_size and l1_size are 32-bit
> so with 64 KB cluster_size this overflows if l1_size > 65535.  Do you
> want to cast to size_t?
> 
>> +    for (i = 0; i < bm->l1_size; ++i) {
>> +        offset = be64_to_cpu(l1_table[i]);
>> +        if (!(offset & 1)) {
> 
> This doesn't honor the 0 offset means unallocated cluster behavior for
> the Standard Cluster Descriptor from the qcow2 specification.
> 
>> +            ret = bdrv_pread(bs->file, offset, buf + i * s->cluster_size,
>> +                             s->cluster_size);
>> +            if (ret < 0) {
>> +                goto fail;
> 
> Missing g_free(buf)
> 
>> +    l1_table = g_try_new(uint64_t, bm->l1_size);
>> +    if (l1_table == NULL) {
>> +        ret = -ENOMEM;
>> +        goto fail;
>> +    }
>> +
>> +    /* initialize with zero clusters */
>> +    for (i = 0; i < s->l1_size; i++) {
>> +        l1_table[i] = cpu_to_be64(1);
>> +    }
>> +
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, bm->l1_table_offset,
>> +                                        s->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
>> +                      s->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
> 
> Flush is needed here to ensure the bitmap has reached disk before the
> dirty_bitmaps array is written out.
> 
>> +
>> +    g_free(l1_table);
>> +    l1_table = NULL;
>> +
>> +    /* Append the new dirty bitmap to the dirty bitmap list */
>> +    new_dirty_bitmap_list = g_new(QCowDirtyBitmap, s->nb_dirty_bitmaps + 1);
>> +    if (s->dirty_bitmaps) {
>> +        memcpy(new_dirty_bitmap_list, s->dirty_bitmaps,
>> +               s->nb_dirty_bitmaps * sizeof(QCowDirtyBitmap));
>> +        old_dirty_bitmap_list = s->dirty_bitmaps;
>> +    }
>> +    s->dirty_bitmaps = new_dirty_bitmap_list;
>> +    s->dirty_bitmaps[s->nb_dirty_bitmaps++] = *bm;
>> +
>> +    ret = qcow2_write_dirty_bitmaps(bs);
>> +    if (ret < 0) {
>> +        g_free(s->dirty_bitmaps);
>> +        s->dirty_bitmaps = old_dirty_bitmap_list;
>> +        s->nb_dirty_bitmaps--;
>> +        goto fail;
>> +    }
>> +
>> +    g_free(old_dirty_bitmap_list);
>> +
>> +    return 0;
>> +
>> +fail:
>> +    g_free(bm->name);
>> +    g_free(l1_table);
>> +
> 
> The l1_table clusters should be freed on failure.
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2015-06-11 20:06 ` Stefan Hajnoczi
@ 2015-06-12 19:34 ` John Snow
  2015-06-17 14:29   ` Vladimir Sementsov-Ogievskiy
  10 siblings, 1 reply; 76+ messages in thread
From: John Snow @ 2015-06-12 19:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Fam Zheng, stefanha, den



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> v2:
>  - rebase on my 'Dirty bitmaps migration' series
>  - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for
>    testing like with dirty bitmaps migration
>  - autoclean features
> 
> v1:
> 
> The bitmaps are saved into qcow2 file format. It provides both
> 'internal' and 'external' dirty bitmaps feature:
>  - for qcow2 drives we can store bitmaps in the same file
>  - for other formats we can store bitmaps in the separate qcow2 file
> 
> QCow2 header is extended by fields 'nb_dirty_bitmaps' and
> 'dirty_bitmaps_offset' like with snapshots.
> 
> Proposed command line syntax is the following:
> 
> -dirty-bitmap [option1=val1][,option2=val2]...
>     Available options are:
>     name         The name for the bitmap (necessary).
> 
>     file         The file to load the bitmap from.
> 
>     file_id      When specified with 'file' option, then this file will
>                  be available through this id for other -dirty-bitmap
>                  options when specified without 'file' option, then it
>                  is a reference to 'file', specified with another
>                  -dirty-bitmap option, and it will be used to load the
>                  bitmap from.
> 
>     drive        The drive to bind the bitmap to. It should be specified
>                  as 'id' suboption of one of -drive options. If nor
>                  'file' neither 'file_id' are specified, then the bitmap
>                  will be loaded from that drive (internal dirty bitmap).
> 
>     granularity  The granularity for the bitmap. Not necessary, the
>                  default value may be used.
> 
>     enabled      on|off. Default is 'on'. Disabled bitmaps are not
>                  changing regardless of writes to corresponding drive.
> 
> Examples:
> 
> qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk
> qemu -drive file=a.raw,id=disk \
>      -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off
> 
> Vladimir Sementsov-Ogievskiy (8):
>   spec: add qcow2-dirty-bitmaps specification
>   qcow2: add dirty-bitmaps feature
>   block: store persistent dirty bitmaps
>   block: add bdrv_load_dirty_bitmap
>   qcow2: add qcow2_dirty_bitmap_delete_all
>   qcow2: add autoclear bit for dirty bitmaps
>   qemu: command line option for dirty bitmaps
>   iotests: test internal persistent dirty bitmap
> 
>  block.c                       |  82 +++++++
>  block/Makefile.objs           |   2 +-
>  block/qcow2-dirty-bitmap.c    | 537 ++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.c                 |  69 +++++-
>  block/qcow2.h                 |  61 +++++
>  blockdev.c                    |  38 +++
>  docs/specs/qcow2.txt          |  66 ++++++
>  include/block/block.h         |   9 +
>  include/block/block_int.h     |  10 +
>  include/sysemu/blockdev.h     |   1 +
>  include/sysemu/sysemu.h       |   1 +
>  qemu-options.hx               |  37 +++
>  tests/qemu-iotests/118        |  83 +++++++
>  tests/qemu-iotests/118.out    |   5 +
>  tests/qemu-iotests/group      |   1 +
>  tests/qemu-iotests/iotests.py |   6 +
>  vl.c                          | 100 ++++++++
>  17 files changed, 1105 insertions(+), 3 deletions(-)
>  create mode 100644 block/qcow2-dirty-bitmap.c
>  create mode 100755 tests/qemu-iotests/118
>  create mode 100644 tests/qemu-iotests/118.out
> 

Well, you said "RFC" ... So here's some "C" that you RF'd.

Many of these points are a "wish list" of sorts and don't necessarily
have to be implemented all at once, but we should be careful to design
the core series with the later additions in mind.

Many of these items are things that I wouldn't mind working on
(Primarily the QMP interfaces), provided that the core of this series
will allow for them to exist. I can take many of the QMP/transaction
interface projects, for instance.

I'm starting to think we won't be able to squeeze this in for 2.4, but
we can have a bulk of the work well underway for 2.5, by which point I
am hopeful that libvirt will be beginning to pick up motion for
integration of this feature.

I think that the basic approach you have so far is good, we just have to
plan out our required extensions and then we can review the base to make
sure it supports the features we want in the near future.


(1) General storage design

- Persistence bitmaps can be stored in any arbitrary qcow2 file,
regardless of if that qcow2 holds data or not.

- Any given qcow2 file with or without data can hold bitmaps intended
for any number of other drives.

- Dirty bitmaps are not assumed to be able to be stored in any
particular location.

So far, this is good. I like the flexibility this provides. This lets us
do all kinds of cool things like store bitmaps for 20 different raw
drives inside of a single 'bitmaps.qcow2' if we wish.


(2) Bitmaps added via QMP do not get any persistence attributes.

This is something we'll need to change. Existing QMP commands that let
us modify bitmaps:

block-dirty-bitmap-add		[+transaction]
block-dirty-bitmap-remove
block-dirty-bitmap-clear	[+transaction]

- block-dirty-bitmap-add:

We will want the ability for bitmap-add to specify a persistence option.
What I am less clear on is what this attribute should look like.

should we add target: <filename> as an attribute here,
or should it be target: <node> to specify the file object that we want
to store this bitmap in? Or perhaps both?:

mode: file, target: <filename>
mode: node, target: <node>

Or even an explicit usability feature that lets us specify that we wish
to store the bitmap for the drive we're attaching it to:

block-dirty-bitmap=add node=drive0 name=bitmap0 mode=self

The implication here is that the default value for persist could be
"none", which does not attempt to store this bitmap anywhere.

- block-dirty-bitmap-remove

If we remove a bitmap with persistence options active, it needs to be
cleared out of the file it is being stored in. Currently we use
"release" to remove a bitmap, which deletes only the in-memory portion
of the bitmap, so you also use release in your series to delete
in-memory bitmaps after we're done with them.

I think the semantics of the "remove" QMP option here, however, should
include a call to the storage layer to remove the bitmap in question.

Let's split the "release" function into two functions:
(A) bdrv_dirty_bitmap_free (which just frees the in-memory bits)
(B) bdrv_dirty_bitmap_delete (which relies on _free but deletes from
disk also.)

Then bdrv_close can use bitmap_free, but the QMP remove command can
utilize _delete.

- block-dirty-bitmap-clear:

This needs to clear the bitmap on-disk if it has persistence features
active.

- block-dirty-bitmap-copy:

This is only a proposal currently, but worth us keeping it in mind. We
should decide on copy semantics. Should the copy keep the persistence
attributes of the source bitmap by default and allow a user to override
it if desired, or should we force the persistence attribute back to
null/None until the user overrides?

I suspect defaulting it to no persistence is probably the sanest until
we're told otherwise (either via an extension to the copy command or a
later edit command.)

Since the QMP interfaces has been my area so far, I can draft their
addition as a new series if you'd like.


(3) Additional QMP interfaces

We should add the ability to modify a bitmap's persistence after it has
been added.

block-dirty-bitmap-edit mode=<file,node,self,none> target=<...>

This will allow us to add persistence to a bitmap after creation, or
remove persistence from a bitmap without deleting it if it's no longer
desired.

Perhaps at a later date we could even have it change where the bitmap is
stored through this mechanism.

(Usability features might include the ability for us to rename or change
the granularity of the bitmap, too -- but that's future usability stuff,
not core functionality.)

Like the above, I can draft this addition.


(4) Storage Format

I think overall the bitmap extension headers look sane, but Kevin is the
ultimate authority here.

I /would/ like to see an additional header bitfield reserved
for some arbitrary flags that can be used at a later date. A uint32_t
should be sufficient for now, with some of the upper bits reserved
either for an extension or a version field to allow us to expand the
bitmap headers in the future if necessary.


(5) Bitmap autoloading

Bitmaps are not currently automatically loaded if you pass e.g. (-hda
my_drive_that_also_has_bitmaps.qcow2). This is in part because the drive
a bitmap was intended for is not information stored with the bitmap, so
QEMU has no concept or ability to be able to "auto load" bitmaps.

Hinted at earlier by my desire to see something like mode=self, we
should add some flags to the dirty bitmap header stored with each bitmap:

0x01: "This bitmap describes the file it is stored in"
0x02: "This bitmap should be auto-loaded when this file is opened."
0x04: "This bitmap is read-only (disabled.)"

This way, with a properly modern version of QEMU, you could simply just:

qemu -M q35 -enable-kvm -hda windows10.qcow2

and if there were bitmaps inside of windows10.qcow2 that had 0x01 and
0x02 set, you'd get those bitmaps loaded before any IO to the data
clusters of the .qcow2, ensuring data integrity.

Of course, I think that it is currently too complicated to try to
accomplish autoloading of bitmaps for *other* drives, so let's not worry
about that now. This means 0x02 set without 0x01 would be an error.

Of course, when autoloading bitmaps, we'll have to check that the size
of the bitmap matches the size of the drive. This is easy to do, though.

The 0x01 bit can be set automatically when that circumstance is
detected, and 0x02 can be set perhaps as an option to
--dirty-bitmap auto=yes
or via the QMP
block-dirty-bitmap-add ... auto=yes
or via the edit command,
block-dirty-bitmap-edit ... auto=yes

Maybe we could also set it implicitly if mode=self is used, too.


(6) qemu-img interface

Stefan has mentioned that it would be nice to implement a query ability
to qemu-img to list bitmaps stored in qcow2 files, along with some of
their key attributes. size, granularity, any flags. It's probably not
efficient to list the dirty count, unless we begin storing that
information manually in the header. I don't think there's a strong need
for that level of info, though.

I can handle this part, if you'd like.


(7) CLI interface

- The only way to get a bitmap loaded into memory from file is to use
the --dirty-bitmap argument where you specify the name, file,
destination drive, and granularity.

- The only way to create a new bitmap that will integrate with the
persistence features is to specify a new bitmap that does not currently
exist within a file and allow the qcow2 layer to create the in-memory
bitmap for us.

This helps us with the flexibility that makes this design a winning
choice overall, but it's cumbersome for some special common cases I
think we should be supporting.

As mentioned previously, I think granularity should not be part
of the lookup process -- just creation, and even then I think this CLI
syntax should not automatically create bitmaps if it wasn't found -- if
the user didn't intend to make a bitmap, an error is likely more
appropriate.

Perhaps --dirty-bitmap create=true,[...] would be sufficient for
specifying intent here, at which point granularity makes sense for the
creation process.

As for the granularity, I think this should be appropriate:

--dirty-bitmap file=bitmaps.qcow2,name=bitmap0,drive=drive0

And that should be sufficient to look in bitmaps.qcow2, find 'bitmap0',
and attach it to 'drive0', throwing an error if the sizes don't match.


(8) Namespaces

Stefan also asked me about the bitmap namespaces -- in-memory of course,
each node can have their own "bitmap0" without any collisions because
all bitmaps are always referred to by their (bs,name) pair.

How do we address bitmaps inside a file, though?

If any given bitmap containing .qcow2 file can store an arbitrary number
of bitmaps intended for an arbitrary number of destinations, how do we
handle this?

EXAMPLE:
-dirty-bitmap name=bitmap0,drive=drive0,file=bitmaps.qcow2
-dirty-bitmap name=bitmap0,drive=drive1,file=bitmaps.qcow2

I think this might currently do very funky things, if bitmaps.qcow2 is
currently empty -- I think both calls will succeed, but it will fail
later when it tries to store them and cannot.

I think we need to do one of two things:

(A) Keep the namespace inside of a .qcow2 file as it is now, but ALWAYS
check up front if a bitmap *can* be added to the file. This way we don't
run into problems after we've dirtied the bitmap.

(B) Find a way to accommodate bitmaps with the same names that were
intended for different nodes.

I don't have a good idea for #2, so I think #1 is probably the way to
go. We can amend the bitmap documentation to specify that although the
bitmap names are unique per-node, if you want to store them in the same
file, you're going to want to give them globally unique names.


(9) Data consistency

We need to discuss the data safety element to this. I think that
atomically before the first write is flushed to disk, the dirty bitmap
needs to *at least* set a bit in the bitmap header that indicates that
the bitmap is no longer up-to-date.

When the bitmap is later flushed to disk, that bit can be cleared until
the next write occurs, which repeats the process.

We have discussed this (long ago) in the past, but one of the ideas was
to monitor the relative utilization rate of the disk and attempt to
flush the bitmap whenever there was a lull in disk IO, then clear the
"inconsistent" bit.

On close, the flush of data and bitmap both would lead us to clear this
bit as well.

Upon boot, if the inconsistent bit was set, we'd know that the bitmap
was outdated and we'd have to recommend that the bitmap be cleared and a
new bitmap started.

(Or, perhaps, a data-intensive mode where we compare the current data
mode with the most recent incremental backup to re-determine what data
has changed. This would be very, very slow but an option at least for
recovery if started a new full backup is even less desirable.)

Other ideas involve regularly flushing the bitmap at certain timed
intervals, certain usage intervals (e.g. when the changed bitmap data
reaches some total size, like 64KiB of changed bits), or a combination
of regular intervals with "opportunistic" flushing during Disk IO lulls.

This is a key feature that absolutely needs to make it into the base
series, IMO.


(10) Storage Efficiency

We should discuss the usage of meta bitmaps or ancillary bitmaps to
record which parts of our bitmap data need to be flushed to disk in
order to reduce flush/close time.

The current meta bitmap implementation optimizes for 1KiB writes to the
network (which fits well under the standard 1500bytes), but perhaps we
could optimize for local storage block size and use this to be stingy
about how much data we decide to write to disk.

I believe this is another feature that should be included in the initial
series as well, because it might radically impact the core design.


(11) Migration

Stefan already touched on this, but we should be mindful of the
different kinds of migration scenarios.

We might migrate the disks, or they might be shared already.

We might migrate (or share) a disk, but what happens if we didn't
migrate or didn't share the bitmap storage file that we were using?

Bitmaps without persistence data will migrate just fine, but how do we
intend to migrate the persistence data itself? I suppose as a first pass
we can just tap into the migration calls and migrate some properties like:

"This bitmap relies on node_id=xxxx to save its bitmap"

and that should probably work for either kind of storage migration
tactic. The only problem would be nodes without IDs that we opened by
filename ...

...Another technique would be for any bitmap that is persistent is to
store them all first prior to migration and then allow the destination
to load them anew. This would also work for either shared or migrated
storage if we worked it right.

It seems a little hairy, and I don't have the answers right now...
Something I will ponder on the weekend.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] qemu: command line option for dirty bitmaps
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 7/8] qemu: command line option " Vladimir Sementsov-Ogievskiy
  2015-06-11 20:57   ` John Snow
@ 2015-06-12 21:49   ` John Snow
  1 sibling, 0 replies; 76+ messages in thread
From: John Snow @ 2015-06-12 21:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> The patch adds the following command line option:
> 
> -dirty-bitmap [option1=val1][,option2=val2]...
>     Available options are:
>     name         The name for the bitmap (necessary).
> 
>     file         The file to load the bitmap from.
> 
>     file_id      When specified with 'file' option, then this file will
>                  be available through this id for other -dirty-bitmap
>                  options when specified without 'file' option, then it
>                  is a reference to 'file', specified with another
>                  -dirty-bitmap option, and it will be used to load the
>                  bitmap from.
> 
>     drive        The drive to bind the bitmap to. It should be specified
>                  as 'id' suboption of one of -drive options. If nor
>                  'file' neither 'file_id' are specified, then the bitmap
>                  will be loaded from that drive (internal dirty bitmap).
> 
>     granularity  The granularity for the bitmap. Not necessary, the
>                  default value may be used.
> 
>     enabled      on|off. Default is 'on'. Disabled bitmaps are not
>                  changing regardless of writes to corresponding drive.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  blockdev.c                |  38 ++++++++++++++++++
>  include/sysemu/blockdev.h |   1 +
>  include/sysemu/sysemu.h   |   1 +
>  qemu-options.hx           |  37 +++++++++++++++++
>  vl.c                      | 100 ++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 177 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 5eaf77e..2a74395 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -176,6 +176,11 @@ QemuOpts *drive_def(const char *optstr)
>      return qemu_opts_parse(qemu_find_opts("drive"), optstr, 0);
>  }
>  
> +QemuOpts *dirty_bitmap_def(const char *optstr)
> +{
> +    return qemu_opts_parse(qemu_find_opts("dirty-bitmap"), optstr, 0);
> +}
> +
>  QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
>                      const char *optstr)
>  {
> @@ -3093,6 +3098,39 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>      return head;
>  }
>  
> +QemuOptsList qemu_dirty_bitmap_opts = {
> +    .name = "dirty-bitmap",
> +    .head = QTAILQ_HEAD_INITIALIZER(qemu_dirty_bitmap_opts.head),
> +    .desc = {
> +        {
> +            .name = "name",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Name of the dirty bitmap",
> +        },{
> +            .name = "file",
> +            .type = QEMU_OPT_STRING,
> +            .help = "file name to load the bitmap from",
> +        },{
> +            .name = "file_id",
> +            .type = QEMU_OPT_STRING,
> +            .help = "node name to load the bitmap from (or to set id for"
> +                    " for file, opened by previous option)",
> +        },{
> +            .name = "drive",
> +            .type = QEMU_OPT_STRING,
> +            .help = "drive id to bind the bitmap to",
> +        },{
> +            .name = "granularity",
> +            .type = QEMU_OPT_NUMBER,
> +            .help = "granularity",
> +        },{
> +            .name = "enabled",
> +            .type = QEMU_OPT_BOOL,
> +            .help = "enabled flag (default is 'on')",
> +        }
> +    }
> +};
> +
>  QemuOptsList qemu_common_drive_opts = {
>      .name = "drive",
>      .head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
> diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
> index 7ca59b5..5b101b8 100644
> --- a/include/sysemu/blockdev.h
> +++ b/include/sysemu/blockdev.h
> @@ -57,6 +57,7 @@ int drive_get_max_devs(BlockInterfaceType type);
>  DriveInfo *drive_get_next(BlockInterfaceType type);
>  
>  QemuOpts *drive_def(const char *optstr);
> +QemuOpts *dirty_bitmap_def(const char *optstr);
>  QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
>                      const char *optstr);
>  DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type);
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 8a52934..681a8f3 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -207,6 +207,7 @@ bool usb_enabled(void);
>  
>  extern QemuOptsList qemu_legacy_drive_opts;
>  extern QemuOptsList qemu_common_drive_opts;
> +extern QemuOptsList qemu_dirty_bitmap_opts;
>  extern QemuOptsList qemu_drive_opts;
>  extern QemuOptsList qemu_chardev_opts;
>  extern QemuOptsList qemu_device_opts;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index ec356f6..5e93122 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -614,6 +614,43 @@ qemu-system-i386 -hda a -hdb b
>  @end example
>  ETEXI
>  
> +DEF("dirty-bitmap", HAS_ARG, QEMU_OPTION_dirty_bitmap,
> +    "-dirty-bitmap name=name[,file=file][,file_id=file_id][,drive=@var{id}]\n"
> +    "              [,granularity=granularity][,enabled=on|off]\n",
> +    QEMU_ARCH_ALL)
> +STEXI
> +@item -dirty-bitmap @var{option}[,@var{option}[,@var{option}[,...]]]
> +@findex -dirty-bitmap
> +
> +Define a dirty-bitmap. Valid options are:
> +
> +@table @option
> +@item name=@var{name}
> +The name of the bitmap. Should be unique per @var{file}/@var{drive} and per
> +@var{for_drive}.
> +@item file=@var{file}
> +The separate qcow2 file for loading the bitmap @var{name} from it.
> +@item file_id=@var{file_id}
> +When specified with @var{file} option, then this @var{file} will be available
> +through this @var{file_id} for other @option{-dirty-bitmap} options.
> +When specified without @var{file} option, then it is a reference to @var{file},
> +specified with another @option{-dirty-bitmap} option, and it will be used to
> +load the bitmap from.
> +@item drive=@var{drive}
> +The drive to bind the bitmap to. It should be specified as @var{id} suboption
> +of one of @option{-drive} options.
> +If nor @var{file} neither @var{file_id} are specified, then the bitmap will be
> +loaded from that drive (internal dirty bitmap).
> +@item granularity=@var{granularity}
> +Granularity (in bytes) for created dirty bitmap. If the bitmap is already
> +exists in specified @var{file}/@var{file_id}/@var{device} it's granularity will
> +not be changed but only checked (an error will be generated if this check
> +fails).
> +@item enabled=@var{enabled}
> +Enabled flag for the bitmap. By default the bitmap will be enabled.
> +@end table
> +ETEXI
> +
>  DEF("mtdblock", HAS_ARG, QEMU_OPTION_mtdblock,
>      "-mtdblock file  use 'file' as on-board Flash memory image\n",
>      QEMU_ARCH_ALL)
> diff --git a/vl.c b/vl.c
> index 83871f5..fb16d0c 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1091,6 +1091,95 @@ static int cleanup_add_fd(QemuOpts *opts, void *opaque)
>  #define MTD_OPTS ""
>  #define SD_OPTS ""
>  
> +static int dirty_bitmap_func(QemuOpts *opts, void *opaque)
> +{
> +    Error *local_err = NULL;
> +    Error **errp = &local_err;
> +    BlockDriverState *file_bs = NULL, *for_bs = NULL;
> +    BdrvDirtyBitmap *bitmap = NULL;
> +
> +    const char *name = qemu_opt_get(opts, "name");
> +    const char *drive = qemu_opt_get(opts, "drive");
> +    const char *file = qemu_opt_get(opts, "file");
> +    const char *file_id = qemu_opt_get(opts, "file_id");
> +
> +    uint64_t granularity = qemu_opt_get_number(opts, "granularity", 0);
> +    bool enabled = qemu_opt_get_bool(opts, "enabled", true);
> +
> +    if (name == NULL) {
> +        error_setg(errp, "'name' option is necessary");
> +        goto fail;
> +    }
> +
> +    if (drive == NULL) {
> +        error_setg(errp, "'drive' option is necessary");
> +        goto fail;
> +    }
> +
> +    for_bs = bdrv_lookup_bs(drive, NULL, errp);
> +    if (for_bs == NULL) {
> +        goto fail;
> +    }
> +
> +    if (file != NULL) {
> +        QDict *options = NULL;
> +        if (file_id != NULL) {
> +            options = qdict_new();
> +            qdict_put(options, "node-name", qstring_from_str(file_id));
> +        }
> +
> +        bdrv_open(&file_bs, file, NULL, options, 0, NULL, errp);

This will open the file read-only without BDRV_O_RDWR in the flags field
(at least), so this doesn't work.

Please add a test case for specifying file= in addition to the current
one that tests the "file_bs = for_bs" case below.

> +        if (options) {
> +            QDECREF(options);
> +        }
> +        if (file_bs == NULL) {
> +            goto fail;
> +        }
> +    } else if (file_id != NULL) {
> +        file_bs = bdrv_find_node(file_id);
> +        if (file_bs == NULL) {
> +            error_setg(errp, "node '%s' is not found", drive);
> +            goto fail;
> +        }
> +    } else {
> +        file_bs = for_bs;
> +    }
> +
> +    if (granularity == 0) {
> +        granularity = bdrv_get_default_bitmap_granularity(for_bs);
> +    }
> +
> +    bitmap = bdrv_load_dirty_bitmap(for_bs, file_bs, granularity, name,
> +                                    errp);
> +    if (*errp != NULL) {
> +        goto fail;
> +    }
> +
> +    if (bitmap == NULL) {
> +        /* bitmap is not found in file_bs */
> +        bitmap = bdrv_create_dirty_bitmap(for_bs, granularity, name, errp);
> +        if (!bitmap) {
> +            goto fail;
> +        }
> +    }
> +
> +    bdrv_dirty_bitmap_set_file(bitmap, file_bs);
> +
> +    if (!enabled) {
> +        bdrv_disable_dirty_bitmap(bitmap);
> +    }
> +
> +    return 0;
> +
> +fail:
> +    error_report("-dirty-bitmap: %s", error_get_pretty(local_err));
> +    error_free(local_err);
> +    if (file_bs != NULL) {
> +        bdrv_close(file_bs);
> +    }
> +    return -1;
> +}
> +
>  static int drive_init_func(QemuOpts *opts, void *opaque)
>  {
>      BlockInterfaceType *block_default_type = opaque;
> @@ -2790,6 +2879,7 @@ int main(int argc, char **argv, char **envp)
>      module_call_init(MODULE_INIT_QOM);
>  
>      qemu_add_opts(&qemu_drive_opts);
> +    qemu_add_opts(&qemu_dirty_bitmap_opts);
>      qemu_add_drive_opts(&qemu_legacy_drive_opts);
>      qemu_add_drive_opts(&qemu_common_drive_opts);
>      qemu_add_drive_opts(&qemu_drive_opts);
> @@ -2918,6 +3008,11 @@ int main(int argc, char **argv, char **envp)
>                      exit(1);
>                  }
>                  break;
> +            case QEMU_OPTION_dirty_bitmap:
> +                if (dirty_bitmap_def(optarg) == NULL) {
> +                    exit(1);
> +                }
> +                break;
>              case QEMU_OPTION_set:
>                  if (qemu_set_option(optarg) != 0)
>                      exit(1);
> @@ -4198,6 +4293,11 @@ int main(int argc, char **argv, char **envp)
>  
>      parse_numa_opts(machine_class);
>  
> +    if (qemu_opts_foreach(qemu_find_opts("dirty-bitmap"), dirty_bitmap_func,
> +                          NULL, 1) != 0) {
> +        exit(1);
> +    }
> +
>      if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
>          exit(1);
>      }
> 

-- 
—js

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature Vladimir Sementsov-Ogievskiy
                     ` (2 preceding siblings ...)
  2015-06-11 23:04   ` John Snow
@ 2015-06-12 21:55   ` John Snow
  2015-08-26 13:15     ` Vladimir Sementsov-Ogievskiy
  2015-08-27 12:43   ` Vladimir Sementsov-Ogievskiy
  4 siblings, 1 reply; 76+ messages in thread
From: John Snow @ 2015-06-12 21:55 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den



On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> Adds dirty-bitmaps feature to qcow2 format as specified in
> docs/specs/qcow2.txt
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/Makefile.objs        |   2 +-
>  block/qcow2-dirty-bitmap.c | 503 +++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.c              |  56 +++++
>  block/qcow2.h              |  50 +++++
>  include/block/block_int.h  |  10 +
>  5 files changed, 620 insertions(+), 1 deletion(-)
>  create mode 100644 block/qcow2-dirty-bitmap.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 0d8c2a4..bff12b4 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,5 +1,5 @@
>  block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o qcow2-dirty-bitmap.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
> new file mode 100644
> index 0000000..bc0167c
> --- /dev/null
> +++ b/block/qcow2-dirty-bitmap.c
> @@ -0,0 +1,503 @@
> +/*
> + * Dirty bitmpas for the QCOW version 2 format
> + *
> + * Copyright (c) 2014-2015 Vladimir Sementsov-Ogievskiy
> + *
> + * This file is derived from qcow2-snapshot.c, original copyright:
> + * Copyright (c) 2004-2006 Fabrice Bellard
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "block/qcow2.h"
> +
> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        g_free(s->dirty_bitmaps[i].name);
> +    }
> +    g_free(s->dirty_bitmaps);
> +    s->dirty_bitmaps = NULL;
> +    s->nb_dirty_bitmaps = 0;
> +}
> +

OK.

> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmapHeader h;
> +    QCowDirtyBitmap *bm;
> +    int i, name_size;
> +    int64_t offset;
> +    int ret;
> +
> +    if (!s->nb_dirty_bitmaps) {
> +        s->dirty_bitmaps = NULL;
> +        s->dirty_bitmaps_size = 0;
> +        return 0;
> +    }
> +
> +    offset = s->dirty_bitmaps_offset;
> +    s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        /* Read statically sized part of the dirty_bitmap header */
> +        offset = align_offset(offset, 8);
> +        ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +
> +        offset += sizeof(h);
> +        bm = s->dirty_bitmaps + i;
> +        bm->l1_table_offset = be64_to_cpu(h.l1_table_offset);
> +        bm->l1_size = be32_to_cpu(h.l1_size);
> +        bm->bitmap_granularity = be32_to_cpu(h.bitmap_granularity);
> +        bm->bitmap_size = be64_to_cpu(h.bitmap_size);
> +
> +        name_size = be16_to_cpu(h.name_size);
> +
> +        /* Read dirty_bitmap name */
> +        bm->name = g_malloc(name_size + 1);
> +        ret = bdrv_pread(bs->file, offset, bm->name, name_size);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += name_size;
> +        bm->name[name_size] = '\0';
> +
> +        if (offset - s->dirty_bitmaps_offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
> +            ret = -EFBIG;
> +            goto fail;
> +        }
> +    }
> +
> +    assert(offset - s->dirty_bitmaps_offset <= INT_MAX);
> +    s->dirty_bitmaps_size = offset - s->dirty_bitmaps_offset;
> +    return 0;
> +
> +fail:
> +    qcow2_free_dirty_bitmaps(bs);
> +    return ret;
> +}
> +

OK

> +/* Add at the end of the file a new table of dirty bitmaps */
> +static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap *bm;
> +    QCowDirtyBitmapHeader h;
> +    int i, name_size, dirty_bitmaps_size;
> +    int64_t offset, dirty_bitmaps_offset = 0;
> +    int ret;
> +
> +    int old_dirty_bitmaps_size = s->dirty_bitmaps_size;
> +    int64_t old_dirty_bitmaps_offset = s->dirty_bitmaps_offset;
> +
> +    /* Compute the size of the dirty bitmaps table */
> +    offset = 0;
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        bm = s->dirty_bitmaps + i;
> +        offset = align_offset(offset, 8);
> +        offset += sizeof(h);
> +        offset += strlen(bm->name);
> +
> +        if (offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
> +            ret = -EFBIG;
> +            goto fail;
> +        }
> +    }
> +
> +    assert(offset <= INT_MAX);
> +    dirty_bitmaps_size = offset;
> +
> +    /* Allocate space for the new dirty bitmap table */
> +    dirty_bitmaps_offset = qcow2_alloc_clusters(bs, dirty_bitmaps_size);
> +    offset = dirty_bitmaps_offset;
> +    if (offset < 0) {
> +        ret = offset;
> +        goto fail;
> +    }
> +    ret = bdrv_flush(bs);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    /* The dirty bitmap table position has not yet been updated, so these
> +     * clusters must indeed be completely free */
> +    ret = qcow2_pre_write_overlap_check(bs, 0, offset, dirty_bitmaps_size);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    /* Write all dirty bitmaps to the new table */
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        bm = s->dirty_bitmaps + i;
> +        memset(&h, 0, sizeof(h));
> +        h.l1_table_offset = cpu_to_be64(bm->l1_table_offset);
> +        h.l1_size = cpu_to_be32(bm->l1_size);
> +        h.bitmap_granularity = cpu_to_be32(bm->bitmap_granularity);
> +        h.bitmap_size = cpu_to_be64(bm->bitmap_size);
> +
> +        name_size = strlen(bm->name);
> +        assert(name_size <= UINT16_MAX);
> +        h.name_size = cpu_to_be16(name_size);
> +        offset = align_offset(offset, 8);
> +
> +        ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += sizeof(h);
> +
> +        ret = bdrv_pwrite(bs->file, offset, bm->name, name_size);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += name_size;
> +    }
> +
> +    /*
> +     * Update the header extension to point to the new dirty bitmap table. This
> +     * requires the new table and its refcounts to be stable on disk.
> +     */
> +    ret = bdrv_flush(bs);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    s->dirty_bitmaps_offset = dirty_bitmaps_offset;
> +    s->dirty_bitmaps_size = dirty_bitmaps_size;
> +    ret = qcow2_update_header(bs);
> +    if (ret < 0) {
> +        fprintf(stderr, "Could not update qcow2 header\n");
> +        goto fail;
> +    }
> +
> +    /* Free old dirty bitmap table */
> +    qcow2_free_clusters(bs, old_dirty_bitmaps_offset, old_dirty_bitmaps_size,
> +                        QCOW2_DISCARD_ALWAYS);
> +    return 0;
> +
> +fail:
> +    if (dirty_bitmaps_offset > 0) {
> +        qcow2_free_clusters(bs, dirty_bitmaps_offset, dirty_bitmaps_size,
> +                            QCOW2_DISCARD_ALWAYS);
> +    }
> +    return ret;
> +}
> +
> +static int find_dirty_bitmap_by_name(BlockDriverState *bs,
> +                                     const char *name)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        if (!strcmp(s->dirty_bitmaps[i].name, name)) {
> +            return i;
> +        }
> +    }
> +
> +    return -1;
> +}
> +

OK

> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
> +                            const char *name, uint64_t size,
> +                            int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i, dirty_bitmap_index, ret;
> +    uint64_t offset;
> +    QCowDirtyBitmap *bm;
> +    uint64_t *l1_table;
> +    uint8_t *buf;
> +
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index < 0) {
> +        return NULL;
> +    }
> +    bm = &s->dirty_bitmaps[dirty_bitmap_index];
> +
> +    if (size != bm->bitmap_size || granularity != bm->bitmap_granularity) {
> +        return NULL;
> +    }
> +
> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    buf = g_malloc0(bm->l1_size * s->cluster_size);
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        offset = be64_to_cpu(l1_table[i]);
> +        if (!(offset & 1)) {
> +            ret = bdrv_pread(bs->file, offset, buf + i * s->cluster_size,
> +                             s->cluster_size);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +    }
> +
> +    g_free(l1_table);
> +    return buf;
> +
> +fail:
> +    g_free(l1_table);
> +    return NULL;
> +}
> +

OK, though the prototype strikes me as strange: what's the use case for
making the caller specify the size and granularity in order to be able
to see if the bitmap is present?

This makes it hard to tell the difference between "The requested bitmap
wasn't found" and "The requested bitmap did not match the expected
attributes."

Why not just let the caller verify the bitmap received matches their
expectations and leave this as a (bs, name) pair?

> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
> +                            const char *name, uint64_t size,
> +                            int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int cl_size = s->cluster_size;
> +    int i, dirty_bitmap_index, ret = 0, n;
> +    uint64_t *l1_table;
> +    QCowDirtyBitmap *bm;
> +    uint64_t buf_size;
> +    uint8_t *p;
> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
> +
> +    /* find/create dirty bitmap */
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index >= 0) {
> +        bm = s->dirty_bitmaps + dirty_bitmap_index;
> +
> +        if (size != bm->bitmap_size ||
> +            granularity != bm->bitmap_granularity) {
> +            qcow2_dirty_bitmap_delete(bs, name, NULL);

If this fails, we should 'return ret'.

> +            dirty_bitmap_index = -1;
> +        }
> +    }

Oh, find_dirty_bitmap_by_name only looks by name, but then you check to
make sure the size and granularity matches. If it doesn't, you actually
create a new bitmap with the *same name* but different attributes, and
delete the old one.

Is that appropriate? I guess if we're already here in store, it means we
made it past the add checks... which means for whatever reason we
definitely want to store *this* bitmap...

I think this code is a little extraneous, it might be best to just issue
an ultimatum that "You can't have two bitmaps with the same name in a
file." and let that be that -- finding something with the wrong size
would just simply be an error.

> +    if (dirty_bitmap_index < 0) {
> +        qcow2_dirty_bitmap_create(bs, name, size, granularity);

If this fails, we need to return ret immediately.

> +        dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
> +    }
> +    bm = s->dirty_bitmaps + dirty_bitmap_index;
> +
> +    /* read l1 table */
> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto finish;
> +    }
> +
> +    buf_size = (((size - 1) / sector_granularity) >> 3) + 1;
> +    buf_size = align_offset(buf_size, 4);
> +    n = buf_size / cl_size;
> +    p = buf;
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        uint64_t addr = be64_to_cpu(l1_table[i]) & ~511;
> +        int write_size = (i == n ? (buf_size % cl_size) : cl_size);
> +
> +        if (buffer_is_zero(p, write_size)) {
> +            if (addr) {
> +                qcow2_free_clusters(bs, addr, cl_size,
> +                                    QCOW2_DISCARD_ALWAYS);
> +            }
> +            l1_table[i] = cpu_to_be64(1);
> +        } else {
> +            if (!addr) {
> +                addr = qcow2_alloc_clusters(bs, cl_size);
> +                l1_table[i] = cpu_to_be64(addr);
> +            }
> +
> +            ret = bdrv_pwrite(bs->file, addr, p, write_size);
> +            if (ret < 0) {
> +                goto finish;
> +            }
> +        }
> +
> +        p += cl_size;
> +    }
> +
> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
> +                      bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto finish;
> +    }
> +
> +finish:
> +    g_free(l1_table);
> +    return ret;
> +}
> +/* if no id is provided, a new one is constructed */
> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
> +                              uint64_t size, int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap *new_dirty_bitmap_list = NULL;
> +    QCowDirtyBitmap *old_dirty_bitmap_list = NULL;
> +    QCowDirtyBitmap sn1, *bm = &sn1;
> +    int i, ret;
> +    uint64_t *l1_table = NULL;
> +    int64_t l1_table_offset;
> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
> +
> +    if (s->nb_dirty_bitmaps >= QCOW_MAX_DIRTY_BITMAPS) {
> +        return -EFBIG;
> +    }
> +
> +    memset(bm, 0, sizeof(*bm));
> +
> +    /* Check that the ID is unique */
> +    if (find_dirty_bitmap_by_name(bs, name) >= 0) {
> +        return -EEXIST;
> +    }
> +
> +    /* Populate bm with passed data */
> +    bm->name = g_strdup(name);
> +    bm->bitmap_granularity = granularity;
> +    bm->bitmap_size = size;
> +
> +    bm->l1_size =
> +        size_to_clusters(s, (((size - 1) / sector_granularity) >> 3) + 1);
> +    l1_table_offset =
> +        qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));

As mentioned previously (sorry for replying so many times to the same
patches, I've been making multiple passes over these patches and trying
to work my way through the qcow2 bits) I think the use of bm->l1_size
and s->l1_size is getting a little mixed up here and there.

Should this be bm->l1_size?

> +    if (l1_table_offset < 0) {
> +        ret = l1_table_offset;
> +        goto fail;
> +    }
> +    bm->l1_table_offset = l1_table_offset;
> +
> +    l1_table = g_try_new(uint64_t, bm->l1_size);
> +    if (l1_table == NULL) {
> +        ret = -ENOMEM;
> +        goto fail;
> +    }
> +
> +    /* initialize with zero clusters */
> +    for (i = 0; i < s->l1_size; i++) {

Here too?

> +        l1_table[i] = cpu_to_be64(1);
> +    }
> +
> +    ret = qcow2_pre_write_overlap_check(bs, 0, bm->l1_table_offset,
> +                                        s->l1_size * sizeof(uint64_t));

And here?

> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
> +                      s->l1_size * sizeof(uint64_t));

Or here?

> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    g_free(l1_table);
> +    l1_table = NULL;
> +
> +    /* Append the new dirty bitmap to the dirty bitmap list */
> +    new_dirty_bitmap_list = g_new(QCowDirtyBitmap, s->nb_dirty_bitmaps + 1);
> +    if (s->dirty_bitmaps) {
> +        memcpy(new_dirty_bitmap_list, s->dirty_bitmaps,
> +               s->nb_dirty_bitmaps * sizeof(QCowDirtyBitmap));
> +        old_dirty_bitmap_list = s->dirty_bitmaps;
> +    }
> +    s->dirty_bitmaps = new_dirty_bitmap_list;
> +    s->dirty_bitmaps[s->nb_dirty_bitmaps++] = *bm;
> +
> +    ret = qcow2_write_dirty_bitmaps(bs);
> +    if (ret < 0) {
> +        g_free(s->dirty_bitmaps);
> +        s->dirty_bitmaps = old_dirty_bitmap_list;
> +        s->nb_dirty_bitmaps--;
> +        goto fail;
> +    }
> +
> +    g_free(old_dirty_bitmap_list);
> +
> +    return 0;
> +
> +fail:
> +    g_free(bm->name);
> +    g_free(l1_table);
> +
> +    return ret;
> +}
> +
> +static int qcow2_dirty_bitmap_free_clusters(BlockDriverState *bs,
> +                                            QCowDirtyBitmap *bm)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int ret, i;
> +    uint64_t *l1_table = g_new(uint64_t, bm->l1_size);
> +
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        g_free(l1_table);
> +        return ret;
> +    }
> +
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        uint64_t addr = be64_to_cpu(l1_table[i]);
> +        qcow2_free_clusters(bs, addr, s->cluster_size, QCOW2_DISCARD_ALWAYS);
> +    }
> +
> +    qcow2_free_clusters(bs, bm->l1_table_offset, bm->l1_size * sizeof(uint64_t),
> +                        QCOW2_DISCARD_ALWAYS);
> +
> +    g_free(l1_table);
> +    return 0;
> +}
> +
> +int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
> +                              const char *name,
> +                              Error **errp)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap bm;
> +    int dirty_bitmap_index, ret = 0;
> +
> +    /* Search the dirty_bitmap */
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index < 0) {
> +        error_setg(errp, "Can't find the dirty bitmap");
> +        return -ENOENT;
> +    }
> +    bm = s->dirty_bitmaps[dirty_bitmap_index];
> +
> +    /* Remove it from the dirty_bitmap list */
> +    memmove(s->dirty_bitmaps + dirty_bitmap_index,
> +            s->dirty_bitmaps + dirty_bitmap_index + 1,
> +            (s->nb_dirty_bitmaps - dirty_bitmap_index - 1) * sizeof(bm));
> +    s->nb_dirty_bitmaps--;
> +    ret = qcow2_write_dirty_bitmaps(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret,
> +                         "Failed to remove dirty bitmap"
> +                         " from dirty bitmap list");
> +        return ret;
> +    }

What happens to the bitmap we didn't successfully delete at this point?
Seems to me that it's leaked and our internal representation of what's
in the file becomes inconsistent, no?

The only place we currently call this function, too, doesn't even check
the return code, so this isn't safe.

> +
> +    qcow2_dirty_bitmap_free_clusters(bs, &bm);
> +    g_free(bm.name);
> +
> +    return ret;
> +}
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b9a72e3..406e55d 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -61,6 +61,7 @@ typedef struct {
>  #define  QCOW2_EXT_MAGIC_END 0
>  #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
>  #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
> +#define  QCOW2_EXT_MAGIC_DIRTY_BITMAPS 0x23852875
>  
>  static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
>  {
> @@ -90,6 +91,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>      QCowExtension ext;
>      uint64_t offset;
>      int ret;
> +    Qcow2DirtyBitmapHeaderExt dirty_bitmaps_ext;
>  
>  #ifdef DEBUG_EXT
>      printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
> @@ -160,6 +162,33 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>              }
>              break;
>  
> +        case QCOW2_EXT_MAGIC_DIRTY_BITMAPS:
> +            ret = bdrv_pread(bs->file, offset, &dirty_bitmaps_ext, ext.len);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "ERROR: dirty_bitmaps_ext: "
> +                                 "Could not read ext header");
> +                return ret;
> +            }
> +
> +            be64_to_cpus(&dirty_bitmaps_ext.dirty_bitmaps_offset);
> +            be32_to_cpus(&dirty_bitmaps_ext.nb_dirty_bitmaps);
> +
> +            s->dirty_bitmaps_offset = dirty_bitmaps_ext.dirty_bitmaps_offset;
> +            s->nb_dirty_bitmaps = dirty_bitmaps_ext.nb_dirty_bitmaps;
> +
> +            ret = qcow2_read_dirty_bitmaps(bs);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Could not read dirty bitmaps");
> +                return ret;
> +            }
> +
> +#ifdef DEBUG_EXT
> +            printf("Qcow2: Got dirty bitmaps extension:"
> +                   " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
> +                   s->dirty_bitmaps_offset, s->nb_dirty_bitmaps);
> +#endif
> +            break;
> +
>          default:
>              /* unknown magic - save it in case we need to rewrite the header */
>              {
> @@ -1000,6 +1029,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>      g_free(s->unknown_header_fields);
>      cleanup_unknown_header_ext(bs);
>      qcow2_free_snapshots(bs);
> +    qcow2_free_dirty_bitmaps(bs);
>      qcow2_refcount_close(bs);
>      qemu_vfree(s->l1_table);
>      /* else pre-write overlap checks in cache_destroy may crash */
> @@ -1466,6 +1496,7 @@ static void qcow2_close(BlockDriverState *bs)
>      qemu_vfree(s->cluster_data);
>      qcow2_refcount_close(bs);
>      qcow2_free_snapshots(bs);
> +    qcow2_free_dirty_bitmaps(bs);
>  }
>  
>  static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
> @@ -1667,6 +1698,21 @@ int qcow2_update_header(BlockDriverState *bs)
>      buf += ret;
>      buflen -= ret;
>  
> +    if (s->nb_dirty_bitmaps > 0) {
> +        Qcow2DirtyBitmapHeaderExt dirty_bitmaps_header = {
> +            .nb_dirty_bitmaps = cpu_to_be32(s->nb_dirty_bitmaps),
> +            .dirty_bitmaps_offset = cpu_to_be64(s->dirty_bitmaps_offset)
> +        };
> +        ret = header_ext_add(buf, QCOW2_EXT_MAGIC_DIRTY_BITMAPS,
> +                             &dirty_bitmaps_header, sizeof(dirty_bitmaps_header),
> +                             buflen);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        buf += ret;
> +        buflen -= ret;
> +    }
> +
>      /* Keep unknown header extensions */
>      QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
>          ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
> @@ -2176,6 +2222,12 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
>          return -ENOTSUP;
>      }
>  
> +    /* cannot proceed if image has dirty_bitmaps */
> +    if (s->nb_dirty_bitmaps) {
> +        error_report("Can't resize an image which has dirty bitmaps");
> +        return -ENOTSUP;
> +    }
> +

Not true anymore: support for truncating a drive with dirty bitmaps was
indeed implemented.

In my mind it's OK if we limit this "temporarily" for now, but I will be
reviewing this series with the ability to change images in mind, as this
is something we should support eventually.

>      /* shrinking is currently not supported */
>      if (offset < bs->total_sectors * 512) {
>          error_report("qcow2 doesn't support shrinking images yet");
> @@ -2952,6 +3004,10 @@ BlockDriver bdrv_qcow2 = {
>      .bdrv_get_info          = qcow2_get_info,
>      .bdrv_get_specific_info = qcow2_get_specific_info,
>  
> +    .bdrv_dirty_bitmap_load = qcow2_dirty_bitmap_load,
> +    .bdrv_dirty_bitmap_store = qcow2_dirty_bitmap_store,
> +    .bdrv_dirty_bitmap_delete = qcow2_dirty_bitmap_delete,
> +
>      .bdrv_save_vmstate    = qcow2_save_vmstate,
>      .bdrv_load_vmstate    = qcow2_load_vmstate,
>  
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 422b825..24beee0 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -39,6 +39,7 @@
>  
>  #define QCOW_MAX_CRYPT_CLUSTERS 32
>  #define QCOW_MAX_SNAPSHOTS 65536
> +#define QCOW_MAX_DIRTY_BITMAPS 65536
>  
>  /* 8 MB refcount table is enough for 2 PB images at 64k cluster size
>   * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
> @@ -52,6 +53,8 @@
>   * space for snapshot names and IDs */
>  #define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
>  
> +#define QCOW_MAX_DIRTY_BITMAPS_SIZE (1024 * QCOW_MAX_DIRTY_BITMAPS)
> +
>  /* indicate that the refcount of the referenced cluster is exactly one. */
>  #define QCOW_OFLAG_COPIED     (1ULL << 63)
>  /* indicate that the cluster is compressed (they never have the copied flag) */
> @@ -138,6 +141,19 @@ typedef struct QEMU_PACKED QCowSnapshotHeader {
>      /* name follows  */
>  } QCowSnapshotHeader;
>  
> +typedef struct QEMU_PACKED QCowDirtyBitmapHeader {
> +    /* header is 8 byte aligned */
> +    uint64_t l1_table_offset;
> +
> +    uint32_t l1_size;
> +    uint32_t bitmap_granularity;
> +
> +    uint64_t bitmap_size;
> +    uint16_t name_size;
> +
> +    /* name follows  */
> +} QCowDirtyBitmapHeader;
> +
>  typedef struct QEMU_PACKED QCowSnapshotExtraData {
>      uint64_t vm_state_size_large;
>      uint64_t disk_size;
> @@ -156,6 +172,14 @@ typedef struct QCowSnapshot {
>      uint64_t vm_clock_nsec;
>  } QCowSnapshot;
>  
> +typedef struct QCowDirtyBitmap {
> +    uint64_t l1_table_offset;
> +    uint32_t l1_size;
> +    char *name;
> +    int bitmap_granularity;
> +    uint64_t bitmap_size;
> +} QCowDirtyBitmap;
> +
>  struct Qcow2Cache;
>  typedef struct Qcow2Cache Qcow2Cache;
>  
> @@ -218,6 +242,11 @@ typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>  typedef void Qcow2SetRefcountFunc(void *refcount_array,
>                                    uint64_t index, uint64_t value);
>  
> +typedef struct Qcow2DirtyBitmapHeaderExt {
> +    uint32_t nb_dirty_bitmaps;
> +    uint64_t dirty_bitmaps_offset;
> +} QEMU_PACKED Qcow2DirtyBitmapHeaderExt;
> +
>  typedef struct BDRVQcowState {
>      int cluster_bits;
>      int cluster_size;
> @@ -259,6 +288,11 @@ typedef struct BDRVQcowState {
>      unsigned int nb_snapshots;
>      QCowSnapshot *snapshots;
>  
> +    uint64_t dirty_bitmaps_offset;
> +    int dirty_bitmaps_size;
> +    unsigned int nb_dirty_bitmaps;
> +    QCowDirtyBitmap *dirty_bitmaps;
> +
>      int flags;
>      int qcow_version;
>      bool use_lazy_refcounts;
> @@ -570,6 +604,22 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
>  void qcow2_free_snapshots(BlockDriverState *bs);
>  int qcow2_read_snapshots(BlockDriverState *bs);
>  
> +/* qcow2-dirty-bitmap.c functions */
> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
> +                             const char *name, uint64_t size,
> +                             int granularity);
> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
> +                                 const char *name, uint64_t size,
> +                                 int granularity);
> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
> +                              uint64_t size, int granularity);
> +int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
> +                              const char *name,
> +                              Error **errp);
> +
> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs);
> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs);
> +
>  /* qcow2-cache.c functions */
>  Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
>  int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index db29b74..88855b4 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -206,6 +206,16 @@ struct BlockDriver {
>      int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
>      ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
>  
> +    int (*bdrv_dirty_bitmap_store)(BlockDriverState *bs, uint8_t *buf,
> +                                   const char *name, uint64_t size,
> +                                   int granularity);
> +    uint8_t *(*bdrv_dirty_bitmap_load)(BlockDriverState *bs,
> +                                       const char *name, uint64_t size,
> +                                       int granularity);
> +    int (*bdrv_dirty_bitmap_delete)(BlockDriverState *bs,
> +                                    const char *name,
> +                                    Error **errp);
> +
>      int (*bdrv_save_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
>                               int64_t pos);
>      int (*bdrv_load_vmstate)(BlockDriverState *bs, uint8_t *buf,
> 

OK, that's the last of my pending replies -- I think I'm done digging
through this series until a v3 shows up at this point, sorry for the
flood of mails :)

Thanks,
--John Snow

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-11 23:04   ` John Snow
@ 2015-06-15 14:05     ` Vladimir Sementsov-Ogievskiy
  2015-06-15 16:53       ` John Snow
  0 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-15 14:05 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den

On 12.06.2015 02:04, John Snow wrote:
>
> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>
>> Adds dirty-bitmaps feature to qcow2 format as specified in
>> docs/specs/qcow2.txt
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/Makefile.objs        |   2 +-
>>   block/qcow2-dirty-bitmap.c | 503 +++++++++++++++++++++++++++++++++++++++++++++
>>   block/qcow2.c              |  56 +++++
>>   block/qcow2.h              |  50 +++++
>>   include/block/block_int.h  |  10 +
>>   5 files changed, 620 insertions(+), 1 deletion(-)
>>   create mode 100644 block/qcow2-dirty-bitmap.c
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 0d8c2a4..bff12b4 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -1,5 +1,5 @@
>>   block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
>> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o qcow2-dirty-bitmap.o
>>   block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>   block-obj-y += qed-check.o
>>   block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
>> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
>> new file mode 100644
>> index 0000000..bc0167c
>> --- /dev/null
>> +++ b/block/qcow2-dirty-bitmap.c
>> @@ -0,0 +1,503 @@
>> +/*
>> + * Dirty bitmpas for the QCOW version 2 format
>> + *
>> + * Copyright (c) 2014-2015 Vladimir Sementsov-Ogievskiy
>> + *
>> + * This file is derived from qcow2-snapshot.c, original copyright:
>> + * Copyright (c) 2004-2006 Fabrice Bellard
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a copy
>> + * of this software and associated documentation files (the "Software"), to deal
>> + * in the Software without restriction, including without limitation the rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block/block_int.h"
>> +#include "block/qcow2.h"
>> +
>> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
>> +        g_free(s->dirty_bitmaps[i].name);
>> +    }
>> +    g_free(s->dirty_bitmaps);
>> +    s->dirty_bitmaps = NULL;
>> +    s->nb_dirty_bitmaps = 0;
>> +}
>> +
>> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    QCowDirtyBitmapHeader h;
>> +    QCowDirtyBitmap *bm;
>> +    int i, name_size;
>> +    int64_t offset;
>> +    int ret;
>> +
>> +    if (!s->nb_dirty_bitmaps) {
>> +        s->dirty_bitmaps = NULL;
>> +        s->dirty_bitmaps_size = 0;
>> +        return 0;
>> +    }
>> +
>> +    offset = s->dirty_bitmaps_offset;
>> +    s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);
>> +
>> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
>> +        /* Read statically sized part of the dirty_bitmap header */
>> +        offset = align_offset(offset, 8);
>> +        ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +
>> +        offset += sizeof(h);
>> +        bm = s->dirty_bitmaps + i;
>> +        bm->l1_table_offset = be64_to_cpu(h.l1_table_offset);
>> +        bm->l1_size = be32_to_cpu(h.l1_size);
>> +        bm->bitmap_granularity = be32_to_cpu(h.bitmap_granularity);
>> +        bm->bitmap_size = be64_to_cpu(h.bitmap_size);
>> +
>> +        name_size = be16_to_cpu(h.name_size);
>> +
>> +        /* Read dirty_bitmap name */
>> +        bm->name = g_malloc(name_size + 1);
>> +        ret = bdrv_pread(bs->file, offset, bm->name, name_size);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +        offset += name_size;
>> +        bm->name[name_size] = '\0';
>> +
>> +        if (offset - s->dirty_bitmaps_offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
>> +            ret = -EFBIG;
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    assert(offset - s->dirty_bitmaps_offset <= INT_MAX);
>> +    s->dirty_bitmaps_size = offset - s->dirty_bitmaps_offset;
>> +    return 0;
>> +
>> +fail:
>> +    qcow2_free_dirty_bitmaps(bs);
>> +    return ret;
>> +}
>> +
>> +/* Add at the end of the file a new table of dirty bitmaps */
>> +static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    QCowDirtyBitmap *bm;
>> +    QCowDirtyBitmapHeader h;
>> +    int i, name_size, dirty_bitmaps_size;
>> +    int64_t offset, dirty_bitmaps_offset = 0;
>> +    int ret;
>> +
>> +    int old_dirty_bitmaps_size = s->dirty_bitmaps_size;
>> +    int64_t old_dirty_bitmaps_offset = s->dirty_bitmaps_offset;
>> +
>> +    /* Compute the size of the dirty bitmaps table */
>> +    offset = 0;
>> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
>> +        bm = s->dirty_bitmaps + i;
>> +        offset = align_offset(offset, 8);
>> +        offset += sizeof(h);
>> +        offset += strlen(bm->name);
>> +
>> +        if (offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
>> +            ret = -EFBIG;
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    assert(offset <= INT_MAX);
>> +    dirty_bitmaps_size = offset;
>> +
>> +    /* Allocate space for the new dirty bitmap table */
>> +    dirty_bitmaps_offset = qcow2_alloc_clusters(bs, dirty_bitmaps_size);
>> +    offset = dirty_bitmaps_offset;
>> +    if (offset < 0) {
>> +        ret = offset;
>> +        goto fail;
>> +    }
>> +    ret = bdrv_flush(bs);
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    /* The dirty bitmap table position has not yet been updated, so these
>> +     * clusters must indeed be completely free */
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, offset, dirty_bitmaps_size);
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    /* Write all dirty bitmaps to the new table */
>> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
>> +        bm = s->dirty_bitmaps + i;
>> +        memset(&h, 0, sizeof(h));
>> +        h.l1_table_offset = cpu_to_be64(bm->l1_table_offset);
>> +        h.l1_size = cpu_to_be32(bm->l1_size);
>> +        h.bitmap_granularity = cpu_to_be32(bm->bitmap_granularity);
>> +        h.bitmap_size = cpu_to_be64(bm->bitmap_size);
>> +
>> +        name_size = strlen(bm->name);
>> +        assert(name_size <= UINT16_MAX);
>> +        h.name_size = cpu_to_be16(name_size);
>> +        offset = align_offset(offset, 8);
>> +
>> +        ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +        offset += sizeof(h);
>> +
>> +        ret = bdrv_pwrite(bs->file, offset, bm->name, name_size);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +        offset += name_size;
>> +    }
>> +
>> +    /*
>> +     * Update the header extension to point to the new dirty bitmap table. This
>> +     * requires the new table and its refcounts to be stable on disk.
>> +     */
>> +    ret = bdrv_flush(bs);
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    s->dirty_bitmaps_offset = dirty_bitmaps_offset;
>> +    s->dirty_bitmaps_size = dirty_bitmaps_size;
>> +    ret = qcow2_update_header(bs);
>> +    if (ret < 0) {
>> +        fprintf(stderr, "Could not update qcow2 header\n");
>> +        goto fail;
>> +    }
>> +
>> +    /* Free old dirty bitmap table */
>> +    qcow2_free_clusters(bs, old_dirty_bitmaps_offset, old_dirty_bitmaps_size,
>> +                        QCOW2_DISCARD_ALWAYS);
>> +    return 0;
>> +
>> +fail:
>> +    if (dirty_bitmaps_offset > 0) {
>> +        qcow2_free_clusters(bs, dirty_bitmaps_offset, dirty_bitmaps_size,
>> +                            QCOW2_DISCARD_ALWAYS);
>> +    }
>> +    return ret;
>> +}
>> +
>> +static int find_dirty_bitmap_by_name(BlockDriverState *bs,
>> +                                     const char *name)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int i;
>> +
>> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
>> +        if (!strcmp(s->dirty_bitmaps[i].name, name)) {
>> +            return i;
>> +        }
>> +    }
>> +
>> +    return -1;
>> +}
>> +
>> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
>> +                            const char *name, uint64_t size,
>> +                            int granularity)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int i, dirty_bitmap_index, ret;
>> +    uint64_t offset;
>> +    QCowDirtyBitmap *bm;
>> +    uint64_t *l1_table;
>> +    uint8_t *buf;
>> +
>> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
>> +    if (dirty_bitmap_index < 0) {
>> +        return NULL;
>> +    }
>> +    bm = &s->dirty_bitmaps[dirty_bitmap_index];
>> +
>> +    if (size != bm->bitmap_size || granularity != bm->bitmap_granularity) {
>> +        return NULL;
>> +    }
>> +
>> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
>> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
>> +                     bm->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    buf = g_malloc0(bm->l1_size * s->cluster_size);
>> +    for (i = 0; i < bm->l1_size; ++i) {
>> +        offset = be64_to_cpu(l1_table[i]);
>> +        if (!(offset & 1)) {
>> +            ret = bdrv_pread(bs->file, offset, buf + i * s->cluster_size,
>> +                             s->cluster_size);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +    }
>> +
>> +    g_free(l1_table);
>> +    return buf;
>> +
>> +fail:
>> +    g_free(l1_table);
>> +    return NULL;
>> +}
>> +
>> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
>> +                            const char *name, uint64_t size,
>> +                            int granularity)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int cl_size = s->cluster_size;
>> +    int i, dirty_bitmap_index, ret = 0, n;
>> +    uint64_t *l1_table;
>> +    QCowDirtyBitmap *bm;
>> +    uint64_t buf_size;
>> +    uint8_t *p;
>> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
>> +
>> +    /* find/create dirty bitmap */
>> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
>> +    if (dirty_bitmap_index >= 0) {
>> +        bm = s->dirty_bitmaps + dirty_bitmap_index;
>> +
>> +        if (size != bm->bitmap_size ||
>> +            granularity != bm->bitmap_granularity) {
>> +            qcow2_dirty_bitmap_delete(bs, name, NULL);
>> +            dirty_bitmap_index = -1;
>> +        }
>> +    }
>> +    if (dirty_bitmap_index < 0) {
>> +        qcow2_dirty_bitmap_create(bs, name, size, granularity);
>> +        dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
>> +    }
>> +    bm = s->dirty_bitmaps + dirty_bitmap_index;
> I catch a segfault right around here if I do the following:
>
> ./x86_64-softmmu/qemu-system-x86_64 --dirty-bitmap
> file=bitmaps.qcow2,name=bitmap0,drive=drive0 -drive
> if=none,file=hda.qcow2,id=drive0 -device ide-hd,drive=drive0
>
> hda.qcow2 and bitmaps.qcow2 are both empty files, but bitmaps.qcow2 has
> a size of '0'.
empty file or qcow2 files of size 0 (with header) ?
>
> Then when I click close in the QEMU GTK frontend, we hit a segfault when
> trying to close because s->dirty_bitmaps is NULL, because it appears as
> if we've never actually tried to add the (empty) bitmap to the (empty) file.
>
> Your iotest works, but I am not actually sure why, because I don't
> actually know how to *create* a persistent bitmap. I thought that the
> -dirty-bitmap CLI would create one in the file specified with file=, but
> it apparently only creates an in-memory bitmap and sets the file
> pointer, but never initializes any of these structures. Then, when we go
> to close, it gets confused and everything breaks a bit.
>
>> +
>> +    /* read l1 table */
>> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
>> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
>> +                     bm->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto finish;
>> +    }
>> +
>> +    buf_size = (((size - 1) / sector_granularity) >> 3) + 1;
>> +    buf_size = align_offset(buf_size, 4);
>> +    n = buf_size / cl_size;
>> +    p = buf;
>> +    for (i = 0; i < bm->l1_size; ++i) {
>> +        uint64_t addr = be64_to_cpu(l1_table[i]) & ~511;
>> +        int write_size = (i == n ? (buf_size % cl_size) : cl_size);
>> +
>> +        if (buffer_is_zero(p, write_size)) {
>> +            if (addr) {
>> +                qcow2_free_clusters(bs, addr, cl_size,
>> +                                    QCOW2_DISCARD_ALWAYS);
>> +            }
>> +            l1_table[i] = cpu_to_be64(1);
>> +        } else {
>> +            if (!addr) {
>> +                addr = qcow2_alloc_clusters(bs, cl_size);
>> +                l1_table[i] = cpu_to_be64(addr);
>> +            }
>> +
>> +            ret = bdrv_pwrite(bs->file, addr, p, write_size);
>> +            if (ret < 0) {
>> +                goto finish;
>> +            }
>> +        }
>> +
>> +        p += cl_size;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
>> +                      bm->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto finish;
>> +    }
>> +
>> +finish:
>> +    g_free(l1_table);
>> +    return ret;
>> +}
>> +/* if no id is provided, a new one is constructed */
>> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
>> +                              uint64_t size, int granularity)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    QCowDirtyBitmap *new_dirty_bitmap_list = NULL;
>> +    QCowDirtyBitmap *old_dirty_bitmap_list = NULL;
>> +    QCowDirtyBitmap sn1, *bm = &sn1;
>> +    int i, ret;
>> +    uint64_t *l1_table = NULL;
>> +    int64_t l1_table_offset;
>> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
>> +
>> +    if (s->nb_dirty_bitmaps >= QCOW_MAX_DIRTY_BITMAPS) {
>> +        return -EFBIG;
>> +    }
>> +
>> +    memset(bm, 0, sizeof(*bm));
>> +
>> +    /* Check that the ID is unique */
>> +    if (find_dirty_bitmap_by_name(bs, name) >= 0) {
>> +        return -EEXIST;
>> +    }
>> +
>> +    /* Populate bm with passed data */
>> +    bm->name = g_strdup(name);
>> +    bm->bitmap_granularity = granularity;
>> +    bm->bitmap_size = size;
>> +
>> +    bm->l1_size =
>> +        size_to_clusters(s, (((size - 1) / sector_granularity) >> 3) + 1);
>> +    l1_table_offset =
>> +        qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
>> +    if (l1_table_offset < 0) {
>> +        ret = l1_table_offset;
>> +        goto fail;
>> +    }
>> +    bm->l1_table_offset = l1_table_offset;
>> +
>> +    l1_table = g_try_new(uint64_t, bm->l1_size);
>> +    if (l1_table == NULL) {
>> +        ret = -ENOMEM;
>> +        goto fail;
>> +    }
>> +
>> +    /* initialize with zero clusters */
>> +    for (i = 0; i < s->l1_size; i++) {
>> +        l1_table[i] = cpu_to_be64(1);
> bm->l1_size here in my crash output is just "1",
> but s->l1_size is 16, so we crash all over this array.
>
> I assume you meant bm->l1_size here. This is a good case to make against
> calling everything "L1."
>
>> +    }
>> +
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, bm->l1_table_offset,
>> +                                        s->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
>> +                      s->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    g_free(l1_table);
> I can also catch a segfault here by doing something like this:
>
> ./x86_64-softmmu/qemu-system-x86_64 -drive
> if=none,format=qcow2,cache=writethrough,file=hda.qcow2,id=drive0
> -dirty-bitmap name=bitmap,drive=drive0
>
> Trying to mimick your iotest which does not use an external bitmap file
> -- it uses the implicit self-storage.
>
> In this case, hda.qcow2 is still empty (but sized as 8GB) and I try to
> quit before any writes occur.
>
> freeing l1_table here causes memory corruption and even valgrind goes
> down in flames:
>
> ==13284== Invalid write of size 8
> ==13284==    at 0x53A15E: qcow2_dirty_bitmap_create
> (qcow2-dirty-bitmap.c:406)
> ==13284==    by 0x539D15: qcow2_dirty_bitmap_store
> (qcow2-dirty-bitmap.c:307)
> ==13284==    by 0x505F27: bdrv_store_dirty_bitmap (block.c:3176)
> ==13284==    by 0x50306D: bdrv_close (block.c:1739)
> ==13284==    by 0x5032FF: bdrv_close_all (block.c:1797)
> ==13284==    by 0x3049DC: main (vl.c:4577)
> ==13284==  Address 0x239b7978 is 0 bytes after a block of size 8 alloc'd
> ==13284==    at 0x4A06BCF: malloc (vg_replace_malloc.c:296)
> ==13284==    by 0x300111: malloc_and_trace (vl.c:2706)
> ==13284==    by 0x62B954E: g_try_malloc (gmem.c:242)
> ==13284==    by 0x53A11E: qcow2_dirty_bitmap_create
> (qcow2-dirty-bitmap.c:398)
> ==13284==    by 0x539D15: qcow2_dirty_bitmap_store
> (qcow2-dirty-bitmap.c:307)
> ==13284==    by 0x505F27: bdrv_store_dirty_bitmap (block.c:3176)
> ==13284==    by 0x50306D: bdrv_close (block.c:1739)
> ==13284==    by 0x5032FF: bdrv_close_all (block.c:1797)
> ==13284==    by 0x3049DC: main (vl.c:4577)
> ==13284==
> --13284-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11
> (SIGSEGV) - exiting
> --13284-- si_code=80;  Faulting address: 0x0;  sp: 0x8090a1de0
>
> valgrind: the 'impossible' happened:
>     Killed by fatal signal
>
>> +    l1_table = NULL;
>> +
>> +    /* Append the new dirty bitmap to the dirty bitmap list */
>> +    new_dirty_bitmap_list = g_new(QCowDirtyBitmap, s->nb_dirty_bitmaps + 1);
>> +    if (s->dirty_bitmaps) {
>> +        memcpy(new_dirty_bitmap_list, s->dirty_bitmaps,
>> +               s->nb_dirty_bitmaps * sizeof(QCowDirtyBitmap));
>> +        old_dirty_bitmap_list = s->dirty_bitmaps;
>> +    }
>> +    s->dirty_bitmaps = new_dirty_bitmap_list;
>> +    s->dirty_bitmaps[s->nb_dirty_bitmaps++] = *bm;
>> +
>> +    ret = qcow2_write_dirty_bitmaps(bs);
>> +    if (ret < 0) {
>> +        g_free(s->dirty_bitmaps);
>> +        s->dirty_bitmaps = old_dirty_bitmap_list;
>> +        s->nb_dirty_bitmaps--;
>> +        goto fail;
>> +    }
>> +
>> +    g_free(old_dirty_bitmap_list);
>> +
>> +    return 0;
>> +
> Disk is 8GiB, 16,777,216 sectors, and bm->bitmap_size matches that.
>> +fail:
>> +    g_free(bm->name);
>> +    g_free(l1_table);
>> +
>> +    return ret;
>> +}
>> +
>> +static int qcow2_dirty_bitmap_free_clusters(BlockDriverState *bs,
>> +                                            QCowDirtyBitmap *bm)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int ret, i;
>> +    uint64_t *l1_table = g_new(uint64_t, bm->l1_size);
>> +
>> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
>> +                     bm->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        g_free(l1_table);
>> +        return ret;
>> +    }
>> +
>> +    for (i = 0; i < bm->l1_size; ++i) {
>> +        uint64_t addr = be64_to_cpu(l1_table[i]);
>> +        qcow2_free_clusters(bs, addr, s->cluster_size, QCOW2_DISCARD_ALWAYS);
>> +    }
>> +
>> +    qcow2_free_clusters(bs, bm->l1_table_offset, bm->l1_size * sizeof(uint64_t),
>> +                        QCOW2_DISCARD_ALWAYS);
>> +
>> +    g_free(l1_table);
>> +    return 0;
>> +}
>> +
>> +int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
>> +                              const char *name,
>> +                              Error **errp)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    QCowDirtyBitmap bm;
>> +    int dirty_bitmap_index, ret = 0;
>> +
>> +    /* Search the dirty_bitmap */
>> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
>> +    if (dirty_bitmap_index < 0) {
>> +        error_setg(errp, "Can't find the dirty bitmap");
>> +        return -ENOENT;
>> +    }
>> +    bm = s->dirty_bitmaps[dirty_bitmap_index];
>> +
>> +    /* Remove it from the dirty_bitmap list */
>> +    memmove(s->dirty_bitmaps + dirty_bitmap_index,
>> +            s->dirty_bitmaps + dirty_bitmap_index + 1,
>> +            (s->nb_dirty_bitmaps - dirty_bitmap_index - 1) * sizeof(bm));
>> +    s->nb_dirty_bitmaps--;
>> +    ret = qcow2_write_dirty_bitmaps(bs);
>> +    if (ret < 0) {
>> +        error_setg_errno(errp, -ret,
>> +                         "Failed to remove dirty bitmap"
>> +                         " from dirty bitmap list");
>> +        return ret;
>> +    }
>> +
>> +    qcow2_dirty_bitmap_free_clusters(bs, &bm);
>> +    g_free(bm.name);
>> +
>> +    return ret;
>> +}
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index b9a72e3..406e55d 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -61,6 +61,7 @@ typedef struct {
>>   #define  QCOW2_EXT_MAGIC_END 0
>>   #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
>>   #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
>> +#define  QCOW2_EXT_MAGIC_DIRTY_BITMAPS 0x23852875
>>   
>>   static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
>>   {
>> @@ -90,6 +91,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>       QCowExtension ext;
>>       uint64_t offset;
>>       int ret;
>> +    Qcow2DirtyBitmapHeaderExt dirty_bitmaps_ext;
>>   
>>   #ifdef DEBUG_EXT
>>       printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
>> @@ -160,6 +162,33 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>               }
>>               break;
>>   
>> +        case QCOW2_EXT_MAGIC_DIRTY_BITMAPS:
>> +            ret = bdrv_pread(bs->file, offset, &dirty_bitmaps_ext, ext.len);
>> +            if (ret < 0) {
>> +                error_setg_errno(errp, -ret, "ERROR: dirty_bitmaps_ext: "
>> +                                 "Could not read ext header");
>> +                return ret;
>> +            }
>> +
>> +            be64_to_cpus(&dirty_bitmaps_ext.dirty_bitmaps_offset);
>> +            be32_to_cpus(&dirty_bitmaps_ext.nb_dirty_bitmaps);
>> +
>> +            s->dirty_bitmaps_offset = dirty_bitmaps_ext.dirty_bitmaps_offset;
>> +            s->nb_dirty_bitmaps = dirty_bitmaps_ext.nb_dirty_bitmaps;
>> +
>> +            ret = qcow2_read_dirty_bitmaps(bs);
>> +            if (ret < 0) {
>> +                error_setg_errno(errp, -ret, "Could not read dirty bitmaps");
>> +                return ret;
>> +            }
>> +
>> +#ifdef DEBUG_EXT
>> +            printf("Qcow2: Got dirty bitmaps extension:"
>> +                   " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
>> +                   s->dirty_bitmaps_offset, s->nb_dirty_bitmaps);
>> +#endif
>> +            break;
>> +
>>           default:
>>               /* unknown magic - save it in case we need to rewrite the header */
>>               {
>> @@ -1000,6 +1029,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>       g_free(s->unknown_header_fields);
>>       cleanup_unknown_header_ext(bs);
>>       qcow2_free_snapshots(bs);
>> +    qcow2_free_dirty_bitmaps(bs);
>>       qcow2_refcount_close(bs);
>>       qemu_vfree(s->l1_table);
>>       /* else pre-write overlap checks in cache_destroy may crash */
>> @@ -1466,6 +1496,7 @@ static void qcow2_close(BlockDriverState *bs)
>>       qemu_vfree(s->cluster_data);
>>       qcow2_refcount_close(bs);
>>       qcow2_free_snapshots(bs);
>> +    qcow2_free_dirty_bitmaps(bs);
>>   }
>>   
>>   static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
>> @@ -1667,6 +1698,21 @@ int qcow2_update_header(BlockDriverState *bs)
>>       buf += ret;
>>       buflen -= ret;
>>   
>> +    if (s->nb_dirty_bitmaps > 0) {
>> +        Qcow2DirtyBitmapHeaderExt dirty_bitmaps_header = {
>> +            .nb_dirty_bitmaps = cpu_to_be32(s->nb_dirty_bitmaps),
>> +            .dirty_bitmaps_offset = cpu_to_be64(s->dirty_bitmaps_offset)
>> +        };
>> +        ret = header_ext_add(buf, QCOW2_EXT_MAGIC_DIRTY_BITMAPS,
>> +                             &dirty_bitmaps_header, sizeof(dirty_bitmaps_header),
>> +                             buflen);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +        buf += ret;
>> +        buflen -= ret;
>> +    }
>> +
>>       /* Keep unknown header extensions */
>>       QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
>>           ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
>> @@ -2176,6 +2222,12 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
>>           return -ENOTSUP;
>>       }
>>   
>> +    /* cannot proceed if image has dirty_bitmaps */
>> +    if (s->nb_dirty_bitmaps) {
>> +        error_report("Can't resize an image which has dirty bitmaps");
>> +        return -ENOTSUP;
>> +    }
>> +
>>       /* shrinking is currently not supported */
>>       if (offset < bs->total_sectors * 512) {
>>           error_report("qcow2 doesn't support shrinking images yet");
>> @@ -2952,6 +3004,10 @@ BlockDriver bdrv_qcow2 = {
>>       .bdrv_get_info          = qcow2_get_info,
>>       .bdrv_get_specific_info = qcow2_get_specific_info,
>>   
>> +    .bdrv_dirty_bitmap_load = qcow2_dirty_bitmap_load,
>> +    .bdrv_dirty_bitmap_store = qcow2_dirty_bitmap_store,
>> +    .bdrv_dirty_bitmap_delete = qcow2_dirty_bitmap_delete,
>> +
>>       .bdrv_save_vmstate    = qcow2_save_vmstate,
>>       .bdrv_load_vmstate    = qcow2_load_vmstate,
>>   
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 422b825..24beee0 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -39,6 +39,7 @@
>>   
>>   #define QCOW_MAX_CRYPT_CLUSTERS 32
>>   #define QCOW_MAX_SNAPSHOTS 65536
>> +#define QCOW_MAX_DIRTY_BITMAPS 65536
>>   
>>   /* 8 MB refcount table is enough for 2 PB images at 64k cluster size
>>    * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
>> @@ -52,6 +53,8 @@
>>    * space for snapshot names and IDs */
>>   #define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
>>   
>> +#define QCOW_MAX_DIRTY_BITMAPS_SIZE (1024 * QCOW_MAX_DIRTY_BITMAPS)
>> +
>>   /* indicate that the refcount of the referenced cluster is exactly one. */
>>   #define QCOW_OFLAG_COPIED     (1ULL << 63)
>>   /* indicate that the cluster is compressed (they never have the copied flag) */
>> @@ -138,6 +141,19 @@ typedef struct QEMU_PACKED QCowSnapshotHeader {
>>       /* name follows  */
>>   } QCowSnapshotHeader;
>>   
>> +typedef struct QEMU_PACKED QCowDirtyBitmapHeader {
>> +    /* header is 8 byte aligned */
>> +    uint64_t l1_table_offset;
>> +
>> +    uint32_t l1_size;
>> +    uint32_t bitmap_granularity;
>> +
>> +    uint64_t bitmap_size;
>> +    uint16_t name_size;
>> +
>> +    /* name follows  */
>> +} QCowDirtyBitmapHeader;
>> +
>>   typedef struct QEMU_PACKED QCowSnapshotExtraData {
>>       uint64_t vm_state_size_large;
>>       uint64_t disk_size;
>> @@ -156,6 +172,14 @@ typedef struct QCowSnapshot {
>>       uint64_t vm_clock_nsec;
>>   } QCowSnapshot;
>>   
>> +typedef struct QCowDirtyBitmap {
>> +    uint64_t l1_table_offset;
>> +    uint32_t l1_size;
>> +    char *name;
>> +    int bitmap_granularity;
>> +    uint64_t bitmap_size;
>> +} QCowDirtyBitmap;
>> +
>>   struct Qcow2Cache;
>>   typedef struct Qcow2Cache Qcow2Cache;
>>   
>> @@ -218,6 +242,11 @@ typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>>   typedef void Qcow2SetRefcountFunc(void *refcount_array,
>>                                     uint64_t index, uint64_t value);
>>   
>> +typedef struct Qcow2DirtyBitmapHeaderExt {
>> +    uint32_t nb_dirty_bitmaps;
>> +    uint64_t dirty_bitmaps_offset;
>> +} QEMU_PACKED Qcow2DirtyBitmapHeaderExt;
>> +
>>   typedef struct BDRVQcowState {
>>       int cluster_bits;
>>       int cluster_size;
>> @@ -259,6 +288,11 @@ typedef struct BDRVQcowState {
>>       unsigned int nb_snapshots;
>>       QCowSnapshot *snapshots;
>>   
>> +    uint64_t dirty_bitmaps_offset;
>> +    int dirty_bitmaps_size;
>> +    unsigned int nb_dirty_bitmaps;
>> +    QCowDirtyBitmap *dirty_bitmaps;
>> +
>>       int flags;
>>       int qcow_version;
>>       bool use_lazy_refcounts;
>> @@ -570,6 +604,22 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
>>   void qcow2_free_snapshots(BlockDriverState *bs);
>>   int qcow2_read_snapshots(BlockDriverState *bs);
>>   
>> +/* qcow2-dirty-bitmap.c functions */
>> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
>> +                             const char *name, uint64_t size,
>> +                             int granularity);
>> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
>> +                                 const char *name, uint64_t size,
>> +                                 int granularity);
>> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
>> +                              uint64_t size, int granularity);
>> +int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
>> +                              const char *name,
>> +                              Error **errp);
>> +
>> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs);
>> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs);
>> +
>>   /* qcow2-cache.c functions */
>>   Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
>>   int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index db29b74..88855b4 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -206,6 +206,16 @@ struct BlockDriver {
>>       int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
>>       ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
>>   
>> +    int (*bdrv_dirty_bitmap_store)(BlockDriverState *bs, uint8_t *buf,
>> +                                   const char *name, uint64_t size,
>> +                                   int granularity);
>> +    uint8_t *(*bdrv_dirty_bitmap_load)(BlockDriverState *bs,
>> +                                       const char *name, uint64_t size,
>> +                                       int granularity);
>> +    int (*bdrv_dirty_bitmap_delete)(BlockDriverState *bs,
>> +                                    const char *name,
>> +                                    Error **errp);
>> +
>>       int (*bdrv_save_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
>>                                int64_t pos);
>>       int (*bdrv_load_vmstate)(BlockDriverState *bs, uint8_t *buf,
>>
>
> In light of this, some "sanity" tests that test cases like no writes,
> empty bitmaps, empty files, etc I think will be appropriate.
>
>


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-12 19:02     ` John Snow
@ 2015-06-15 14:42       ` Stefan Hajnoczi
  2015-06-23 17:57         ` John Snow
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-15 14:42 UTC (permalink / raw)
  To: John Snow
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Vladimir Sementsov-Ogievskiy, den, pbonzini

[-- Attachment #1: Type: text/plain, Size: 860 bytes --]

On Fri, Jun 12, 2015 at 03:02:33PM -0400, John Snow wrote:
> 
> 
> On 06/10/2015 10:30 AM, Stefan Hajnoczi wrote:
> > On Mon, Jun 08, 2015 at 06:21:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > 
> > I noticed a corner case, it's probably not a problem in practice:
> > 
> > Since the dirty bitmap is stored with the help of a BlockDriverState
> > (and its bs->file), it's possible that writing the bitmap will cause
> > bits in the bitmap to be dirtied!
> > 
> 
> But since it's metadata and not stored within a disk sector, can this
> actually happen? Do you have an example of a scenario where this might
> come up?

The persistent dirty bitmap for bs->file is storeed in the qcow2 BDS.
This results in recursion.

This is a misconfiguration but I just want to understand what happens
when someone does this by mistake.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-15 14:05     ` Vladimir Sementsov-Ogievskiy
@ 2015-06-15 16:53       ` John Snow
  0 siblings, 0 replies; 76+ messages in thread
From: John Snow @ 2015-06-15 16:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den



On 06/15/2015 10:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 12.06.2015 02:04, John Snow wrote:
>>
>> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>>
>>> Adds dirty-bitmaps feature to qcow2 format as specified in
>>> docs/specs/qcow2.txt
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/Makefile.objs        |   2 +-
>>>   block/qcow2-dirty-bitmap.c | 503
>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>   block/qcow2.c              |  56 +++++
>>>   block/qcow2.h              |  50 +++++
>>>   include/block/block_int.h  |  10 +
>>>   5 files changed, 620 insertions(+), 1 deletion(-)
>>>   create mode 100644 block/qcow2-dirty-bitmap.c
>>>

[snip]

>>> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
>>> +                            const char *name, uint64_t size,
>>> +                            int granularity)
>>> +{
>>> +    BDRVQcowState *s = bs->opaque;
>>> +    int cl_size = s->cluster_size;
>>> +    int i, dirty_bitmap_index, ret = 0, n;
>>> +    uint64_t *l1_table;
>>> +    QCowDirtyBitmap *bm;
>>> +    uint64_t buf_size;
>>> +    uint8_t *p;
>>> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
>>> +
>>> +    /* find/create dirty bitmap */
>>> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
>>> +    if (dirty_bitmap_index >= 0) {
>>> +        bm = s->dirty_bitmaps + dirty_bitmap_index;
>>> +
>>> +        if (size != bm->bitmap_size ||
>>> +            granularity != bm->bitmap_granularity) {
>>> +            qcow2_dirty_bitmap_delete(bs, name, NULL);
>>> +            dirty_bitmap_index = -1;
>>> +        }
>>> +    }
>>> +    if (dirty_bitmap_index < 0) {
>>> +        qcow2_dirty_bitmap_create(bs, name, size, granularity);
>>> +        dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
>>> +    }
>>> +    bm = s->dirty_bitmaps + dirty_bitmap_index;
>> I catch a segfault right around here if I do the following:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 --dirty-bitmap
>> file=bitmaps.qcow2,name=bitmap0,drive=drive0 -drive
>> if=none,file=hda.qcow2,id=drive0 -device ide-hd,drive=drive0
>>
>> hda.qcow2 and bitmaps.qcow2 are both empty files, but bitmaps.qcow2 has
>> a size of '0'.
> empty file or qcow2 files of size 0 (with header) ?

Sorry, that was ambiguous. A properly formatted qcow2 file with a size of 0.

I tried two configurations:

1) Saving a bitmap to the file it describes, using e.g. hda.qcow2, which
is an 8GiB qcow2 file that has been formatted, but contains no
allocations yet.

2) Saving a bitmap to a bitmaps.qcow2 file, which is a formatted qcow2
with a size of 0, to describe a file hda.qcow2 which is 8GiB and has no
allocations either.

Both crash, because I think you are confusing s->l1_size with
bm->l1_size during the storage routine.

The following patch fixes my initial issues with the series, at least at
a cursory glance:

commit 824ed0e9f56425c98ec600abb0e31791d12e628f
Author: John Snow <jsnow@redhat.com>
Date:   Mon Jun 15 12:06:43 2015 -0400

    fix persistence

diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
index 686a121..eaaec14 100644
--- a/block/qcow2-dirty-bitmap.c
+++ b/block/qcow2-dirty-bitmap.c
@@ -300,7 +300,8 @@ int qcow2_dirty_bitmap_store(BlockDriverState *bs,
uint8_t *buf,
         }
     }
     if (dirty_bitmap_index < 0) {
-        qcow2_dirty_bitmap_create(bs, name, size, granularity);
+        ret = qcow2_dirty_bitmap_create(bs, name, size, granularity);
+        if (ret < 0) { return ret; }
         dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
     }
     bm = s->dirty_bitmaps + dirty_bitmap_index;
@@ -384,7 +385,7 @@ int qcow2_dirty_bitmap_create(BlockDriverState *bs,
const char *name,
     bm->l1_size =
         size_to_clusters(s, (((size - 1) / sector_granularity) >> 3) + 1);
     l1_table_offset =
-        qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
+        qcow2_alloc_clusters(bs, bm->l1_size * sizeof(uint64_t));
     if (l1_table_offset < 0) {
         ret = l1_table_offset;
         goto fail;
@@ -398,18 +399,18 @@ int qcow2_dirty_bitmap_create(BlockDriverState
*bs, const char *name,
     }

     /* initialize with zero clusters */
-    for (i = 0; i < s->l1_size; i++) {
+    for (i = 0; i < bm->l1_size; i++) {
         l1_table[i] = cpu_to_be64(1);
     }

     ret = qcow2_pre_write_overlap_check(bs, 0, bm->l1_table_offset,
-                                        s->l1_size * sizeof(uint64_t));
+                                        bm->l1_size * sizeof(uint64_t));
     if (ret < 0) {
         goto fail;
     }

     ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
-                      s->l1_size * sizeof(uint64_t));
+                      bm->l1_size * sizeof(uint64_t));
     if (ret < 0) {
         goto fail;
     }
diff --git a/vl.c b/vl.c
index 4cae8a6..8f6d79f 100644
--- a/vl.c
+++ b/vl.c
@@ -1146,7 +1146,7 @@ static int dirty_bitmap_func(void *opaque,
QemuOpts *opts, Error **errp)
             qdict_put(options, "node-name", qstring_from_str(file_id));
         }

-        bdrv_open(&file_bs, file, NULL, options, 0, NULL, errp);
+        bdrv_open(&file_bs, file, NULL, options, BDRV_O_RDWR, NULL, errp);
         if (options) {
             QDECREF(options);
         }

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-12 19:34 ` John Snow
@ 2015-06-17 14:29   ` Vladimir Sementsov-Ogievskiy
  2015-06-24  0:21     ` John Snow
  0 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-06-17 14:29 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, pbonzini, Fam Zheng, stefanha, den

On 12.06.2015 22:34, John Snow wrote:
>
> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>> v2:
>>   - rebase on my 'Dirty bitmaps migration' series
>>   - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for
>>     testing like with dirty bitmaps migration
>>   - autoclean features
>>
>> v1:
>>
>> The bitmaps are saved into qcow2 file format. It provides both
>> 'internal' and 'external' dirty bitmaps feature:
>>   - for qcow2 drives we can store bitmaps in the same file
>>   - for other formats we can store bitmaps in the separate qcow2 file
>>
>> QCow2 header is extended by fields 'nb_dirty_bitmaps' and
>> 'dirty_bitmaps_offset' like with snapshots.
>>
>> Proposed command line syntax is the following:
>>
>> -dirty-bitmap [option1=val1][,option2=val2]...
>>      Available options are:
>>      name         The name for the bitmap (necessary).
>>
>>      file         The file to load the bitmap from.
>>
>>      file_id      When specified with 'file' option, then this file will
>>                   be available through this id for other -dirty-bitmap
>>                   options when specified without 'file' option, then it
>>                   is a reference to 'file', specified with another
>>                   -dirty-bitmap option, and it will be used to load the
>>                   bitmap from.
>>
>>      drive        The drive to bind the bitmap to. It should be specified
>>                   as 'id' suboption of one of -drive options. If nor
>>                   'file' neither 'file_id' are specified, then the bitmap
>>                   will be loaded from that drive (internal dirty bitmap).
>>
>>      granularity  The granularity for the bitmap. Not necessary, the
>>                   default value may be used.
>>
>>      enabled      on|off. Default is 'on'. Disabled bitmaps are not
>>                   changing regardless of writes to corresponding drive.
>>
>> Examples:
>>
>> qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk
>> qemu -drive file=a.raw,id=disk \
>>       -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off
>>
>> Vladimir Sementsov-Ogievskiy (8):
>>    spec: add qcow2-dirty-bitmaps specification
>>    qcow2: add dirty-bitmaps feature
>>    block: store persistent dirty bitmaps
>>    block: add bdrv_load_dirty_bitmap
>>    qcow2: add qcow2_dirty_bitmap_delete_all
>>    qcow2: add autoclear bit for dirty bitmaps
>>    qemu: command line option for dirty bitmaps
>>    iotests: test internal persistent dirty bitmap
>>
>>   block.c                       |  82 +++++++
>>   block/Makefile.objs           |   2 +-
>>   block/qcow2-dirty-bitmap.c    | 537 ++++++++++++++++++++++++++++++++++++++++++
>>   block/qcow2.c                 |  69 +++++-
>>   block/qcow2.h                 |  61 +++++
>>   blockdev.c                    |  38 +++
>>   docs/specs/qcow2.txt          |  66 ++++++
>>   include/block/block.h         |   9 +
>>   include/block/block_int.h     |  10 +
>>   include/sysemu/blockdev.h     |   1 +
>>   include/sysemu/sysemu.h       |   1 +
>>   qemu-options.hx               |  37 +++
>>   tests/qemu-iotests/118        |  83 +++++++
>>   tests/qemu-iotests/118.out    |   5 +
>>   tests/qemu-iotests/group      |   1 +
>>   tests/qemu-iotests/iotests.py |   6 +
>>   vl.c                          | 100 ++++++++
>>   17 files changed, 1105 insertions(+), 3 deletions(-)
>>   create mode 100644 block/qcow2-dirty-bitmap.c
>>   create mode 100755 tests/qemu-iotests/118
>>   create mode 100644 tests/qemu-iotests/118.out
>>
> Well, you said "RFC" ... So here's some "C" that you RF'd.
>
> Many of these points are a "wish list" of sorts and don't necessarily
> have to be implemented all at once, but we should be careful to design
> the core series with the later additions in mind.
>
> Many of these items are things that I wouldn't mind working on
> (Primarily the QMP interfaces), provided that the core of this series
> will allow for them to exist. I can take many of the QMP/transaction
> interface projects, for instance.
>
> I'm starting to think we won't be able to squeeze this in for 2.4, but
> we can have a bulk of the work well underway for 2.5, by which point I
> am hopeful that libvirt will be beginning to pick up motion for
> integration of this feature.
>
> I think that the basic approach you have so far is good, we just have to
> plan out our required extensions and then we can review the base to make
> sure it supports the features we want in the near future.
>
>
> (1) General storage design
>
> - Persistence bitmaps can be stored in any arbitrary qcow2 file,
> regardless of if that qcow2 holds data or not.
>
> - Any given qcow2 file with or without data can hold bitmaps intended
> for any number of other drives.
Actually, dirty bitmap is not bound to the image, it just have a name, 
identifying it. We can (try to) load any bitmap for any image.
>
> - Dirty bitmaps are not assumed to be able to be stored in any
> particular location.
>
> So far, this is good. I like the flexibility this provides. This lets us
> do all kinds of cool things like store bitmaps for 20 different raw
> drives inside of a single 'bitmaps.qcow2' if we wish.
>
>
> (2) Bitmaps added via QMP do not get any persistence attributes.
>
> This is something we'll need to change. Existing QMP commands that let
> us modify bitmaps:
>
> block-dirty-bitmap-add		[+transaction]
> block-dirty-bitmap-remove
> block-dirty-bitmap-clear	[+transaction]
>
> - block-dirty-bitmap-add:
>
> We will want the ability for bitmap-add to specify a persistence option.
> What I am less clear on is what this attribute should look like.
>
> should we add target: <filename> as an attribute here,
> or should it be target: <node> to specify the file object that we want
> to store this bitmap in? Or perhaps both?:
>
> mode: file, target: <filename>
> mode: node, target: <node>
>
> Or even an explicit usability feature that lets us specify that we wish
> to store the bitmap for the drive we're attaching it to:
>
> block-dirty-bitmap=add node=drive0 name=bitmap0 mode=self
>
> The implication here is that the default value for persist could be
> "none", which does not attempt to store this bitmap anywhere.
>
> - block-dirty-bitmap-remove
>
> If we remove a bitmap with persistence options active, it needs to be
> cleared out of the file it is being stored in. Currently we use
> "release" to remove a bitmap, which deletes only the in-memory portion
> of the bitmap, so you also use release in your series to delete
> in-memory bitmaps after we're done with them.
>
> I think the semantics of the "remove" QMP option here, however, should
> include a call to the storage layer to remove the bitmap in question.
>
> Let's split the "release" function into two functions:
> (A) bdrv_dirty_bitmap_free (which just frees the in-memory bits)
> (B) bdrv_dirty_bitmap_delete (which relies on _free but deletes from
> disk also.)
>
> Then bdrv_close can use bitmap_free, but the QMP remove command can
> utilize _delete.
>
> - block-dirty-bitmap-clear:
>
> This needs to clear the bitmap on-disk if it has persistence features
> active.

Does it? When the bitmap is loaded, its representation on disk is 
inconsistent, and an in_use bit is set (on disk). So, we don't need to 
sync it here.
Syncing on 'remove' is not necessary for the same reason, but may take 
place to not store extra trash..

>
> - block-dirty-bitmap-copy:
>
> This is only a proposal currently, but worth us keeping it in mind. We
> should decide on copy semantics. Should the copy keep the persistence
> attributes of the source bitmap by default and allow a user to override
> it if desired, or should we force the persistence attribute back to
> null/None until the user overrides?
>
> I suspect defaulting it to no persistence is probably the sanest until
> we're told otherwise (either via an extension to the copy command or a
> later edit command.)
>
> Since the QMP interfaces has been my area so far, I can draft their
> addition as a new series if you'd like.

ok.

>
>
> (3) Additional QMP interfaces
>
> We should add the ability to modify a bitmap's persistence after it has
> been added.
>
> block-dirty-bitmap-edit mode=<file,node,self,none> target=<...>
>
> This will allow us to add persistence to a bitmap after creation, or
> remove persistence from a bitmap without deleting it if it's no longer
> desired.
>
> Perhaps at a later date we could even have it change where the bitmap is
> stored through this mechanism.
>
> (Usability features might include the ability for us to rename or change
> the granularity of the bitmap, too -- but that's future usability stuff,
> not core functionality.)
>
> Like the above, I can draft this addition.

no objections)

>
>
> (4) Storage Format
>
> I think overall the bitmap extension headers look sane, but Kevin is the
> ultimate authority here.
>
> I /would/ like to see an additional header bitfield reserved
> for some arbitrary flags that can be used at a later date. A uint32_t
> should be sufficient for now, with some of the upper bits reserved
> either for an extension or a version field to allow us to expand the
> bitmap headers in the future if necessary.

ok

>
>
> (5) Bitmap autoloading
>
> Bitmaps are not currently automatically loaded if you pass e.g. (-hda
> my_drive_that_also_has_bitmaps.qcow2). This is in part because the drive
> a bitmap was intended for is not information stored with the bitmap, so
> QEMU has no concept or ability to be able to "auto load" bitmaps.
>
> Hinted at earlier by my desire to see something like mode=self, we
> should add some flags to the dirty bitmap header stored with each bitmap:
>
> 0x01: "This bitmap describes the file it is stored in"
> 0x02: "This bitmap should be auto-loaded when this file is opened."
> 0x04: "This bitmap is read-only (disabled.)"

The last one - should it be used only for auto-loading bitmaps?

>
> This way, with a properly modern version of QEMU, you could simply just:
>
> qemu -M q35 -enable-kvm -hda windows10.qcow2
>
> and if there were bitmaps inside of windows10.qcow2 that had 0x01 and
> 0x02 set, you'd get those bitmaps loaded before any IO to the data
> clusters of the .qcow2, ensuring data integrity.
>
> Of course, I think that it is currently too complicated to try to
> accomplish autoloading of bitmaps for *other* drives, so let's not worry
> about that now. This means 0x02 set without 0x01 would be an error.
>
> Of course, when autoloading bitmaps, we'll have to check that the size
> of the bitmap matches the size of the drive. This is easy to do, though.

it is always checked)

>
> The 0x01 bit can be set automatically when that circumstance is
> detected, and 0x02 can be set perhaps as an option to
> --dirty-bitmap auto=yes
> or via the QMP
> block-dirty-bitmap-add ... auto=yes
> or via the edit command,
> block-dirty-bitmap-edit ... auto=yes
>
> Maybe we could also set it implicitly if mode=self is used, too.

Also, for auto-loading bitmaps, user can manually load it (changing 
'disabled' bit). And in this case auto-loading should be skipped.
Also, if auto-loading is default behavior, than what about 
--disable-bitmap-autoloading or something like this?

>
> (6) qemu-img interface
>
> Stefan has mentioned that it would be nice to implement a query ability
> to qemu-img to list bitmaps stored in qcow2 files, along with some of
> their key attributes. size, granularity, any flags. It's probably not
> efficient to list the dirty count, unless we begin storing that
> information manually in the header. I don't think there's a strong need
> for that level of info, though.
>
> I can handle this part, if you'd like.

can qmp query block with information about bitmaps be reused here?

>
> (7) CLI interface
>
> - The only way to get a bitmap loaded into memory from file is to use
> the --dirty-bitmap argument where you specify the name, file,
> destination drive, and granularity.
>
> - The only way to create a new bitmap that will integrate with the
> persistence features is to specify a new bitmap that does not currently
> exist within a file and allow the qcow2 layer to create the in-memory
> bitmap for us.
>
> This helps us with the flexibility that makes this design a winning
> choice overall, but it's cumbersome for some special common cases I
> think we should be supporting.
>
> As mentioned previously, I think granularity should not be part
> of the lookup process -- just creation, and even then I think this CLI
> syntax should not automatically create bitmaps if it wasn't found -- if
> the user didn't intend to make a bitmap, an error is likely more
> appropriate.
>
> Perhaps --dirty-bitmap create=true,[...] would be sufficient for
> specifying intent here, at which point granularity makes sense for the
> creation process.
>
> As for the granularity, I think this should be appropriate:
>
> --dirty-bitmap file=bitmaps.qcow2,name=bitmap0,drive=drive0
>
> And that should be sufficient to look in bitmaps.qcow2, find 'bitmap0',
> and attach it to 'drive0', throwing an error if the sizes don't match.

agree, I will do it

>
> (8) Namespaces
>
> Stefan also asked me about the bitmap namespaces -- in-memory of course,
> each node can have their own "bitmap0" without any collisions because
> all bitmaps are always referred to by their (bs,name) pair.
>
> How do we address bitmaps inside a file, though?
>
> If any given bitmap containing .qcow2 file can store an arbitrary number
> of bitmaps intended for an arbitrary number of destinations, how do we
> handle this?
>
> EXAMPLE:
> -dirty-bitmap name=bitmap0,drive=drive0,file=bitmaps.qcow2
> -dirty-bitmap name=bitmap0,drive=drive1,file=bitmaps.qcow2
>
> I think this might currently do very funky things, if bitmaps.qcow2 is
> currently empty -- I think both calls will succeed, but it will fail
> later when it tries to store them and cannot.
>
> I think we need to do one of two things:
>
> (A) Keep the namespace inside of a .qcow2 file as it is now, but ALWAYS
> check up front if a bitmap *can* be added to the file. This way we don't
> run into problems after we've dirtied the bitmap.
>
> (B) Find a way to accommodate bitmaps with the same names that were
> intended for different nodes.
>
> I don't have a good idea for #2, so I think #1 is probably the way to
> go. We can amend the bitmap documentation to specify that although the
> bitmap names are unique per-node, if you want to store them in the same
> file, you're going to want to give them globally unique names.

A: And what about the case: several raw disks and bitmaps.qcow2? In this 
case using of namespaces is impossible. Or we are going to have 
*-bitmap.qcow2 for each disk..

B: As I understand, we have no id or name for the image, it comes from 
cmd line.. So we can't use node name as namespace name. Why not just add 
namespaces?

-drive file=a.raw,id=disk1,dirty-bitmaps-namespace=disk1_ns \
-drive file=b.raw,id=disk2,dirty-bitmaps-namespace=disk2_ns \
  -dirty-bitmap name=bitmap0,drive=disk1,file=bitmaps.qcow2
  -dirty-bitmap name=bitmap1,drive=disk1,file=bitmaps.qcow2
  -dirty-bitmap name=bitmap0,drive=disk2,file=bitmaps.qcow2


Default namespace: empty string or node name?
Namespace name should be stored in bitmap header for each bitmap.. As 
separate field with length field, or may be as bitmap name part 
(separated from it by '#' character for example)

>
>
> (9) Data consistency
>
> We need to discuss the data safety element to this. I think that
> atomically before the first write is flushed to disk, the dirty bitmap
> needs to *at least* set a bit in the bitmap header that indicates that
> the bitmap is no longer up-to-date.
>
> When the bitmap is later flushed to disk, that bit can be cleared until
> the next write occurs, which repeats the process.
>
> We have discussed this (long ago) in the past, but one of the ideas was
> to monitor the relative utilization rate of the disk and attempt to
> flush the bitmap whenever there was a lull in disk IO, then clear the
> "inconsistent" bit.
>
> On close, the flush of data and bitmap both would lead us to clear this
> bit as well.
>
> Upon boot, if the inconsistent bit was set, we'd know that the bitmap
> was outdated and we'd have to recommend that the bitmap be cleared and a
> new bitmap started.
>
> (Or, perhaps, a data-intensive mode where we compare the current data
> mode with the most recent incremental backup to re-determine what data
> has changed. This would be very, very slow but an option at least for
> recovery if started a new full backup is even less desirable.)
>
> Other ideas involve regularly flushing the bitmap at certain timed
> intervals, certain usage intervals (e.g. when the changed bitmap data
> reaches some total size, like 64KiB of changed bits), or a combination
> of regular intervals with "opportunistic" flushing during Disk IO lulls.
>
> This is a key feature that absolutely needs to make it into the base
> series, IMO.

I don't understand, what the use of flushing bitmap not only on 
disk:close? If there no failures with disk, than bitmap will be flushed 
on close and will be consistent for next open(). If there is a disk 
crash, even if we flush the bitmap regularly, what is the possibility of 
crashing immediately after last flush, before further io-s?

>
> (10) Storage Efficiency
>
> We should discuss the usage of meta bitmaps or ancillary bitmaps to
> record which parts of our bitmap data need to be flushed to disk in
> order to reduce flush/close time.
>
> The current meta bitmap implementation optimizes for 1KiB writes to the
> network (which fits well under the standard 1500bytes), but perhaps we
> could optimize for local storage block size and use this to be stingy
> about how much data we decide to write to disk.
>
> I believe this is another feature that should be included in the initial
> series as well, because it might radically impact the core design.

ok

>
> (11) Migration
>
> Stefan already touched on this, but we should be mindful of the
> different kinds of migration scenarios.
>
> We might migrate the disks, or they might be shared already.
>
> We might migrate (or share) a disk, but what happens if we didn't
> migrate or didn't share the bitmap storage file that we were using?
>
> Bitmaps without persistence data will migrate just fine, but how do we
> intend to migrate the persistence data itself? I suppose as a first pass
> we can just tap into the migration calls and migrate some properties like:
>
> "This bitmap relies on node_id=xxxx to save its bitmap"
>
> and that should probably work for either kind of storage migration
> tactic. The only problem would be nodes without IDs that we opened by
> filename ...

It looks like some bitmaps may be migrated automatically (i.e. created 
on destination, if they don't exist), but others don't. This means, that 
user should describe bitmaps in destination cmd, at least bitmaps, 
loaded from file, not node name. And in this case, migration of 
persistent bitmap will success if there is a bitmap on destination for 
the same node, with the same name and granularity and with set 'file' 
field. Otherwise migration fails..

>
> ...Another technique would be for any bitmap that is persistent is to
> store them all first prior to migration and then allow the destination
> to load them anew. This would also work for either shared or migrated
> storage if we worked it right.
>
> It seems a little hairy, and I don't have the answers right now...
> Something I will ponder on the weekend.


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-15 14:42       ` Stefan Hajnoczi
@ 2015-06-23 17:57         ` John Snow
  2015-06-24  9:39           ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: John Snow @ 2015-06-23 17:57 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Vladimir Sementsov-Ogievskiy, den, pbonzini



On 06/15/2015 10:42 AM, Stefan Hajnoczi wrote:
> On Fri, Jun 12, 2015 at 03:02:33PM -0400, John Snow wrote:
>>
>>
>> On 06/10/2015 10:30 AM, Stefan Hajnoczi wrote:
>>> On Mon, Jun 08, 2015 at 06:21:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>
>>> I noticed a corner case, it's probably not a problem in practice:
>>>
>>> Since the dirty bitmap is stored with the help of a BlockDriverState
>>> (and its bs->file), it's possible that writing the bitmap will cause
>>> bits in the bitmap to be dirtied!
>>>
>>
>> But since it's metadata and not stored within a disk sector, can this
>> actually happen? Do you have an example of a scenario where this might
>> come up?
> 
> The persistent dirty bitmap for bs->file is storeed in the qcow2 BDS.
> This results in recursion.
> 
> This is a misconfiguration but I just want to understand what happens
> when someone does this by mistake.
> 
> Stefan
> 

I still don't follow you, actually.

The dirty bitmap only tracks changed virtual disk sectors, not actual
file sectors, right? Writing a bitmap that describes foo.qcow2 to
foo.qcow2 won't dirty bitmaps, it's an out-of-channel write as far as
the bitmap is concerned.

Right? Am I fatally misunderstanding the situation?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-17 14:29   ` Vladimir Sementsov-Ogievskiy
@ 2015-06-24  0:21     ` John Snow
  2015-07-08 12:24       ` Vladimir Sementsov-Ogievskiy
  2015-08-27 10:08       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 76+ messages in thread
From: John Snow @ 2015-06-24  0:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Fam Zheng, stefanha, den



On 06/17/2015 10:29 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 12.06.2015 22:34, John Snow wrote:
>>
>> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> v2:
>>>   - rebase on my 'Dirty bitmaps migration' series
>>>   - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for
>>>     testing like with dirty bitmaps migration
>>>   - autoclean features
>>>
>>> v1:
>>>
>>> The bitmaps are saved into qcow2 file format. It provides both
>>> 'internal' and 'external' dirty bitmaps feature:
>>>   - for qcow2 drives we can store bitmaps in the same file
>>>   - for other formats we can store bitmaps in the separate qcow2 file
>>>
>>> QCow2 header is extended by fields 'nb_dirty_bitmaps' and
>>> 'dirty_bitmaps_offset' like with snapshots.
>>>
>>> Proposed command line syntax is the following:
>>>
>>> -dirty-bitmap [option1=val1][,option2=val2]...
>>>      Available options are:
>>>      name         The name for the bitmap (necessary).
>>>
>>>      file         The file to load the bitmap from.
>>>
>>>      file_id      When specified with 'file' option, then this file will
>>>                   be available through this id for other -dirty-bitmap
>>>                   options when specified without 'file' option, then it
>>>                   is a reference to 'file', specified with another
>>>                   -dirty-bitmap option, and it will be used to load the
>>>                   bitmap from.
>>>
>>>      drive        The drive to bind the bitmap to. It should be
>>> specified
>>>                   as 'id' suboption of one of -drive options. If nor
>>>                   'file' neither 'file_id' are specified, then the
>>> bitmap
>>>                   will be loaded from that drive (internal dirty
>>> bitmap).
>>>
>>>      granularity  The granularity for the bitmap. Not necessary, the
>>>                   default value may be used.
>>>
>>>      enabled      on|off. Default is 'on'. Disabled bitmaps are not
>>>                   changing regardless of writes to corresponding drive.
>>>
>>> Examples:
>>>
>>> qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk
>>> qemu -drive file=a.raw,id=disk \
>>>       -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off
>>>
>>> Vladimir Sementsov-Ogievskiy (8):
>>>    spec: add qcow2-dirty-bitmaps specification
>>>    qcow2: add dirty-bitmaps feature
>>>    block: store persistent dirty bitmaps
>>>    block: add bdrv_load_dirty_bitmap
>>>    qcow2: add qcow2_dirty_bitmap_delete_all
>>>    qcow2: add autoclear bit for dirty bitmaps
>>>    qemu: command line option for dirty bitmaps
>>>    iotests: test internal persistent dirty bitmap
>>>
>>>   block.c                       |  82 +++++++
>>>   block/Makefile.objs           |   2 +-
>>>   block/qcow2-dirty-bitmap.c    | 537
>>> ++++++++++++++++++++++++++++++++++++++++++
>>>   block/qcow2.c                 |  69 +++++-
>>>   block/qcow2.h                 |  61 +++++
>>>   blockdev.c                    |  38 +++
>>>   docs/specs/qcow2.txt          |  66 ++++++
>>>   include/block/block.h         |   9 +
>>>   include/block/block_int.h     |  10 +
>>>   include/sysemu/blockdev.h     |   1 +
>>>   include/sysemu/sysemu.h       |   1 +
>>>   qemu-options.hx               |  37 +++
>>>   tests/qemu-iotests/118        |  83 +++++++
>>>   tests/qemu-iotests/118.out    |   5 +
>>>   tests/qemu-iotests/group      |   1 +
>>>   tests/qemu-iotests/iotests.py |   6 +
>>>   vl.c                          | 100 ++++++++
>>>   17 files changed, 1105 insertions(+), 3 deletions(-)
>>>   create mode 100644 block/qcow2-dirty-bitmap.c
>>>   create mode 100755 tests/qemu-iotests/118
>>>   create mode 100644 tests/qemu-iotests/118.out
>>>
>> Well, you said "RFC" ... So here's some "C" that you RF'd.
>>
>> Many of these points are a "wish list" of sorts and don't necessarily
>> have to be implemented all at once, but we should be careful to design
>> the core series with the later additions in mind.
>>
>> Many of these items are things that I wouldn't mind working on
>> (Primarily the QMP interfaces), provided that the core of this series
>> will allow for them to exist. I can take many of the QMP/transaction
>> interface projects, for instance.
>>
>> I'm starting to think we won't be able to squeeze this in for 2.4, but
>> we can have a bulk of the work well underway for 2.5, by which point I
>> am hopeful that libvirt will be beginning to pick up motion for
>> integration of this feature.
>>
>> I think that the basic approach you have so far is good, we just have to
>> plan out our required extensions and then we can review the base to make
>> sure it supports the features we want in the near future.
>>
>>
>> (1) General storage design
>>
>> - Persistence bitmaps can be stored in any arbitrary qcow2 file,
>> regardless of if that qcow2 holds data or not.
>>
>> - Any given qcow2 file with or without data can hold bitmaps intended
>> for any number of other drives.
>
> Actually, dirty bitmap is not bound to the image, it just have a name,
> identifying it. We can (try to) load any bitmap for any image.
>

I'm not sure what you mean by "bound" here, but yes, as it stands: the
design is very flexible and I like that. It appears that bitmaps for any
number of images can be simultaneously stored in a .qcow2 as a generic
container, so it's a very flexible approach.

I didn't mean to imply that you couldn't do that already, because it
looks like you can.

>>
>> - Dirty bitmaps are not assumed to be able to be stored in any
>> particular location.
>>
>> So far, this is good. I like the flexibility this provides. This lets us
>> do all kinds of cool things like store bitmaps for 20 different raw
>> drives inside of a single 'bitmaps.qcow2' if we wish.
>>
>>
>> (2) Bitmaps added via QMP do not get any persistence attributes.
>>
>> This is something we'll need to change. Existing QMP commands that let
>> us modify bitmaps:
>>
>> block-dirty-bitmap-add        [+transaction]
>> block-dirty-bitmap-remove
>> block-dirty-bitmap-clear    [+transaction]
>>
>> - block-dirty-bitmap-add:
>>
>> We will want the ability for bitmap-add to specify a persistence option.
>> What I am less clear on is what this attribute should look like.
>>
>> should we add target: <filename> as an attribute here,
>> or should it be target: <node> to specify the file object that we want
>> to store this bitmap in? Or perhaps both?:
>>
>> mode: file, target: <filename>
>> mode: node, target: <node>
>>
>> Or even an explicit usability feature that lets us specify that we wish
>> to store the bitmap for the drive we're attaching it to:
>>
>> block-dirty-bitmap=add node=drive0 name=bitmap0 mode=self
>>
>> The implication here is that the default value for persist could be
>> "none", which does not attempt to store this bitmap anywhere.
>>
>> - block-dirty-bitmap-remove
>>
>> If we remove a bitmap with persistence options active, it needs to be
>> cleared out of the file it is being stored in. Currently we use
>> "release" to remove a bitmap, which deletes only the in-memory portion
>> of the bitmap, so you also use release in your series to delete
>> in-memory bitmaps after we're done with them.
>>
>> I think the semantics of the "remove" QMP option here, however, should
>> include a call to the storage layer to remove the bitmap in question.
>>
>> Let's split the "release" function into two functions:
>> (A) bdrv_dirty_bitmap_free (which just frees the in-memory bits)
>> (B) bdrv_dirty_bitmap_delete (which relies on _free but deletes from
>> disk also.)
>>
>> Then bdrv_close can use bitmap_free, but the QMP remove command can
>> utilize _delete.
>>
>> - block-dirty-bitmap-clear:
>>
>> This needs to clear the bitmap on-disk if it has persistence features
>> active.
> 
> Does it? When the bitmap is loaded, its representation on disk is
> inconsistent, and an in_use bit is set (on disk). So, we don't need to
> sync it here.
> Syncing on 'remove' is not necessary for the same reason, but may take
> place to not store extra trash..
> 

Keeping an in-use bit is definitely a way to accommodate this QMP
command, so you're right, we don't need to pay special attention here --
unless we go with some kind of a periodic flush model, at which point if
the bitmap is "clean," this command will need to re-mark it as dirty.

Just a generic ->mark_bitmap_dirty() op to call here would suffice entirely.

>>
>> - block-dirty-bitmap-copy:
>>
>> This is only a proposal currently, but worth us keeping it in mind. We
>> should decide on copy semantics. Should the copy keep the persistence
>> attributes of the source bitmap by default and allow a user to override
>> it if desired, or should we force the persistence attribute back to
>> null/None until the user overrides?
>>
>> I suspect defaulting it to no persistence is probably the sanest until
>> we're told otherwise (either via an extension to the copy command or a
>> later edit command.)
>>
>> Since the QMP interfaces has been my area so far, I can draft their
>> addition as a new series if you'd like.
> 
> ok.
> 
>>
>>
>> (3) Additional QMP interfaces
>>
>> We should add the ability to modify a bitmap's persistence after it has
>> been added.
>>
>> block-dirty-bitmap-edit mode=<file,node,self,none> target=<...>
>>
>> This will allow us to add persistence to a bitmap after creation, or
>> remove persistence from a bitmap without deleting it if it's no longer
>> desired.
>>
>> Perhaps at a later date we could even have it change where the bitmap is
>> stored through this mechanism.
>>
>> (Usability features might include the ability for us to rename or change
>> the granularity of the bitmap, too -- but that's future usability stuff,
>> not core functionality.)
>>
>> Like the above, I can draft this addition.
> 
> no objections)
> 
>>
>>
>> (4) Storage Format
>>
>> I think overall the bitmap extension headers look sane, but Kevin is the
>> ultimate authority here.
>>
>> I /would/ like to see an additional header bitfield reserved
>> for some arbitrary flags that can be used at a later date. A uint32_t
>> should be sufficient for now, with some of the upper bits reserved
>> either for an extension or a version field to allow us to expand the
>> bitmap headers in the future if necessary.
> 
> ok
> 
>>
>>
>> (5) Bitmap autoloading
>>
>> Bitmaps are not currently automatically loaded if you pass e.g. (-hda
>> my_drive_that_also_has_bitmaps.qcow2). This is in part because the drive
>> a bitmap was intended for is not information stored with the bitmap, so
>> QEMU has no concept or ability to be able to "auto load" bitmaps.
>>
>> Hinted at earlier by my desire to see something like mode=self, we
>> should add some flags to the dirty bitmap header stored with each bitmap:
>>
>> 0x01: "This bitmap describes the file it is stored in"
>> 0x02: "This bitmap should be auto-loaded when this file is opened."
>> 0x04: "This bitmap is read-only (disabled.)"
> 
> The last one - should it be used only for auto-loading bitmaps?
> 

Not necessarily, I just lumped it in here as an example to be grouped
near the other flags I thought we needed. Maybe we don't actually need a
read only flag to be stored because there's not currently a use-case for
RO bitmaps outside of migration.

Just a passing thought.

>>
>> This way, with a properly modern version of QEMU, you could simply just:
>>
>> qemu -M q35 -enable-kvm -hda windows10.qcow2
>>
>> and if there were bitmaps inside of windows10.qcow2 that had 0x01 and
>> 0x02 set, you'd get those bitmaps loaded before any IO to the data
>> clusters of the .qcow2, ensuring data integrity.
>>
>> Of course, I think that it is currently too complicated to try to
>> accomplish autoloading of bitmaps for *other* drives, so let's not worry
>> about that now. This means 0x02 set without 0x01 would be an error.
>>
>> Of course, when autoloading bitmaps, we'll have to check that the size
>> of the bitmap matches the size of the drive. This is easy to do, though.
> 
> it is always checked)
> 

You're right. I think I wasn't convinced we needed size (etc) to be part
of the lookup process, but the way you have it now it does always check
the sizes. Just thinking out loud again.

>>
>> The 0x01 bit can be set automatically when that circumstance is
>> detected, and 0x02 can be set perhaps as an option to
>> --dirty-bitmap auto=yes
>> or via the QMP
>> block-dirty-bitmap-add ... auto=yes
>> or via the edit command,
>> block-dirty-bitmap-edit ... auto=yes
>>
>> Maybe we could also set it implicitly if mode=self is used, too.
> 
> Also, for auto-loading bitmaps, user can manually load it (changing
> 'disabled' bit). And in this case auto-loading should be skipped.
> Also, if auto-loading is default behavior, than what about
> --disable-bitmap-autoloading or something like this?
> 

Agreed. Whatever the default is, we need a way to turn it off and be
explicit about it. Perhaps as an argument to -drive?

e.g.

-drive if=none,file=linux-and-bitmaps.qcow2,bitmap=<auto,no>

where "no" would be a very explicit "Do not load any of the bitmaps in
this file."

"auto" would load automatically any of the bitmaps stored there with the
auto/self flags set, and skip the rest otherwise.

Maybe this is serviceable.

>>
>> (6) qemu-img interface
>>
>> Stefan has mentioned that it would be nice to implement a query ability
>> to qemu-img to list bitmaps stored in qcow2 files, along with some of
>> their key attributes. size, granularity, any flags. It's probably not
>> efficient to list the dirty count, unless we begin storing that
>> information manually in the header. I don't think there's a strong need
>> for that level of info, though.
>>
>> I can handle this part, if you'd like.
> 
> can qmp query block with information about bitmaps be reused here?
> 

I don't think so. Here we'd be reading the bitmaps on-disk and reporting
the info stored in-file, instead of the in-memory structures.

>>
>> (7) CLI interface
>>
>> - The only way to get a bitmap loaded into memory from file is to use
>> the --dirty-bitmap argument where you specify the name, file,
>> destination drive, and granularity.
>>
>> - The only way to create a new bitmap that will integrate with the
>> persistence features is to specify a new bitmap that does not currently
>> exist within a file and allow the qcow2 layer to create the in-memory
>> bitmap for us.
>>
>> This helps us with the flexibility that makes this design a winning
>> choice overall, but it's cumbersome for some special common cases I
>> think we should be supporting.
>>
>> As mentioned previously, I think granularity should not be part
>> of the lookup process -- just creation, and even then I think this CLI
>> syntax should not automatically create bitmaps if it wasn't found -- if
>> the user didn't intend to make a bitmap, an error is likely more
>> appropriate.
>>
>> Perhaps --dirty-bitmap create=true,[...] would be sufficient for
>> specifying intent here, at which point granularity makes sense for the
>> creation process.
>>
>> As for the granularity, I think this should be appropriate:
>>
>> --dirty-bitmap file=bitmaps.qcow2,name=bitmap0,drive=drive0
>>
>> And that should be sufficient to look in bitmaps.qcow2, find 'bitmap0',
>> and attach it to 'drive0', throwing an error if the sizes don't match.
> 
> agree, I will do it
> 

Great! I promise I do like the series overall even if I had a
book-length comment about it :)

>>
>> (8) Namespaces
>>
>> Stefan also asked me about the bitmap namespaces -- in-memory of course,
>> each node can have their own "bitmap0" without any collisions because
>> all bitmaps are always referred to by their (bs,name) pair.
>>
>> How do we address bitmaps inside a file, though?
>>
>> If any given bitmap containing .qcow2 file can store an arbitrary number
>> of bitmaps intended for an arbitrary number of destinations, how do we
>> handle this?
>>
>> EXAMPLE:
>> -dirty-bitmap name=bitmap0,drive=drive0,file=bitmaps.qcow2
>> -dirty-bitmap name=bitmap0,drive=drive1,file=bitmaps.qcow2
>>
>> I think this might currently do very funky things, if bitmaps.qcow2 is
>> currently empty -- I think both calls will succeed, but it will fail
>> later when it tries to store them and cannot.
>>
>> I think we need to do one of two things:
>>
>> (A) Keep the namespace inside of a .qcow2 file as it is now, but ALWAYS
>> check up front if a bitmap *can* be added to the file. This way we don't
>> run into problems after we've dirtied the bitmap.
>>

To clarify, I meant "every bitmap name inside of a file is unique. Check
to make sure it is possible to store a new bitmap upon its creation."

>> (B) Find a way to accommodate bitmaps with the same names that were
>> intended for different nodes.
>>
>> I don't have a good idea for #2, so I think #1 is probably the way to
>> go. We can amend the bitmap documentation to specify that although the
>> bitmap names are unique per-node, if you want to store them in the same
>> file, you're going to want to give them globally unique names.
> 
> A: And what about the case: several raw disks and bitmaps.qcow2? In this
> case using of namespaces is impossible. Or we are going to have
> *-bitmap.qcow2 for each disk..
> 

We could continue letting users do drive0 bitmap0 and drive1 bitmap0,
but as soon as they try to use the QMP commands to store those bitmaps,
the QMP command will report an error if the bitmaps.qcow2 already has a
"bitmap0."

It effectively uses a per-file namespace for bitmaps and applies that
restriction to any in-memory bitmaps created with persistence flags.

> B: As I understand, we have no id or name for the image, it comes from
> cmd line.. So we can't use node name as namespace name. Why not just add
> namespaces?
> 
> -drive file=a.raw,id=disk1,dirty-bitmaps-namespace=disk1_ns \
> -drive file=b.raw,id=disk2,dirty-bitmaps-namespace=disk2_ns \
>  -dirty-bitmap name=bitmap0,drive=disk1,file=bitmaps.qcow2
>  -dirty-bitmap name=bitmap1,drive=disk1,file=bitmaps.qcow2
>  -dirty-bitmap name=bitmap0,drive=disk2,file=bitmaps.qcow2
> 
> 
> Default namespace: empty string or node name?
> Namespace name should be stored in bitmap header for each bitmap.. As
> separate field with length field, or may be as bitmap name part
> (separated from it by '#' character for example)
> 

I think that's starting to get a little too manual and verbose on the
CLI at this point. Maybe we really should just enforce the first option
and call it a day.

>>
>>
>> (9) Data consistency
>>
>> We need to discuss the data safety element to this. I think that
>> atomically before the first write is flushed to disk, the dirty bitmap
>> needs to *at least* set a bit in the bitmap header that indicates that
>> the bitmap is no longer up-to-date.
>>
>> When the bitmap is later flushed to disk, that bit can be cleared until
>> the next write occurs, which repeats the process.
>>
>> We have discussed this (long ago) in the past, but one of the ideas was
>> to monitor the relative utilization rate of the disk and attempt to
>> flush the bitmap whenever there was a lull in disk IO, then clear the
>> "inconsistent" bit.
>>
>> On close, the flush of data and bitmap both would lead us to clear this
>> bit as well.
>>
>> Upon boot, if the inconsistent bit was set, we'd know that the bitmap
>> was outdated and we'd have to recommend that the bitmap be cleared and a
>> new bitmap started.
>>
>> (Or, perhaps, a data-intensive mode where we compare the current data
>> mode with the most recent incremental backup to re-determine what data
>> has changed. This would be very, very slow but an option at least for
>> recovery if started a new full backup is even less desirable.)
>>
>> Other ideas involve regularly flushing the bitmap at certain timed
>> intervals, certain usage intervals (e.g. when the changed bitmap data
>> reaches some total size, like 64KiB of changed bits), or a combination
>> of regular intervals with "opportunistic" flushing during Disk IO lulls.
>>
>> This is a key feature that absolutely needs to make it into the base
>> series, IMO.
> 
> I don't understand, what the use of flushing bitmap not only on
> disk:close? If there no failures with disk, than bitmap will be flushed
> on close and will be consistent for next open(). If there is a disk
> crash, even if we flush the bitmap regularly, what is the possibility of
> crashing immediately after last flush, before further io-s?
> 

The usage case is QEMU crash, power failure, etc. Not disk crash. If we
periodically flush to HD, we increase the chances that we don't corrupt
our image and bitmap.

If we NEVER flush, we guarantee that any segfault or power outage will
absolutely trash our data.

>>
>> (10) Storage Efficiency
>>
>> We should discuss the usage of meta bitmaps or ancillary bitmaps to
>> record which parts of our bitmap data need to be flushed to disk in
>> order to reduce flush/close time.
>>
>> The current meta bitmap implementation optimizes for 1KiB writes to the
>> network (which fits well under the standard 1500bytes), but perhaps we
>> could optimize for local storage block size and use this to be stingy
>> about how much data we decide to write to disk.
>>
>> I believe this is another feature that should be included in the initial
>> series as well, because it might radically impact the core design.
> 
> ok
> 

Yeah, we might just end up having meta_bitmaps on all the time and rely
on them to know what remains to be written to disk. I think Stefan
wasn't too keen on the idea of a 512GiB disk needing to write a solid
1MiB of data on every close, when in practice we might be able to reduce
it to just a handful of block writes.

>>
>> (11) Migration
>>
>> Stefan already touched on this, but we should be mindful of the
>> different kinds of migration scenarios.
>>
>> We might migrate the disks, or they might be shared already.
>>
>> We might migrate (or share) a disk, but what happens if we didn't
>> migrate or didn't share the bitmap storage file that we were using?
>>
>> Bitmaps without persistence data will migrate just fine, but how do we
>> intend to migrate the persistence data itself? I suppose as a first pass
>> we can just tap into the migration calls and migrate some properties
>> like:
>>
>> "This bitmap relies on node_id=xxxx to save its bitmap"
>>
>> and that should probably work for either kind of storage migration
>> tactic. The only problem would be nodes without IDs that we opened by
>> filename ...
> 
> It looks like some bitmaps may be migrated automatically (i.e. created
> on destination, if they don't exist), but others don't. This means, that
> user should describe bitmaps in destination cmd, at least bitmaps,
> loaded from file, not node name. And in this case, migration of
> persistent bitmap will success if there is a bitmap on destination for
> the same node, with the same name and granularity and with set 'file'
> field. Otherwise migration fails..
> 

I think I still need to think about this one for a little bit, but I
think there's other work we can do in the meantime at least.

>>
>> ...Another technique would be for any bitmap that is persistent is to
>> store them all first prior to migration and then allow the destination
>> to load them anew. This would also work for either shared or migrated
>> storage if we worked it right.
>>
>> It seems a little hairy, and I don't have the answers right now...
>> Something I will ponder on the weekend.
> 
> 

Thanks!

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-23 17:57         ` John Snow
@ 2015-06-24  9:39           ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-06-24  9:39 UTC (permalink / raw)
  To: John Snow
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Vladimir Sementsov-Ogievskiy, den, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On Tue, Jun 23, 2015 at 01:57:55PM -0400, John Snow wrote:
> On 06/15/2015 10:42 AM, Stefan Hajnoczi wrote:
> > On Fri, Jun 12, 2015 at 03:02:33PM -0400, John Snow wrote:
> >>
> >>
> >> On 06/10/2015 10:30 AM, Stefan Hajnoczi wrote:
> >>> On Mon, Jun 08, 2015 at 06:21:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >>>
> >>> I noticed a corner case, it's probably not a problem in practice:
> >>>
> >>> Since the dirty bitmap is stored with the help of a BlockDriverState
> >>> (and its bs->file), it's possible that writing the bitmap will cause
> >>> bits in the bitmap to be dirtied!
> >>>
> >>
> >> But since it's metadata and not stored within a disk sector, can this
> >> actually happen? Do you have an example of a scenario where this might
> >> come up?
> > 
> > The persistent dirty bitmap for bs->file is storeed in the qcow2 BDS.
> > This results in recursion.
> > 
> > This is a misconfiguration but I just want to understand what happens
> > when someone does this by mistake.
> > 
> > Stefan
> > 
> 
> I still don't follow you, actually.
> 
> The dirty bitmap only tracks changed virtual disk sectors, not actual
> file sectors, right? Writing a bitmap that describes foo.qcow2 to
> foo.qcow2 won't dirty bitmaps, it's an out-of-channel write as far as
> the bitmap is concerned.
> 
> Right? Am I fatally misunderstanding the situation?

There is no out-of-channel for bs->file.  The bs->file raw-posix image
file includes the bs qcow2 sectors that hold the dirty bitmap.

This is a corner case and I don't think any valid configuration would
hit it.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-24  0:21     ` John Snow
@ 2015-07-08 12:24       ` Vladimir Sementsov-Ogievskiy
  2015-07-08 15:21         ` John Snow
  2015-08-27 10:08       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-07-08 12:24 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, pbonzini, Fam Zheng, stefanha, den

On 24.06.2015 03:21, John Snow wrote:
>
> On 06/17/2015 10:29 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 12.06.2015 22:34, John Snow wrote:
>>> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> v2:
>>>>    - rebase on my 'Dirty bitmaps migration' series
>>>>    - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for
>>>>      testing like with dirty bitmaps migration
>>>>    - autoclean features
>>>>
>>>> v1:
>>>>
>>>> The bitmaps are saved into qcow2 file format. It provides both
>>>> 'internal' and 'external' dirty bitmaps feature:
>>>>    - for qcow2 drives we can store bitmaps in the same file
>>>>    - for other formats we can store bitmaps in the separate qcow2 file
>>>>
>>>> QCow2 header is extended by fields 'nb_dirty_bitmaps' and
>>>> 'dirty_bitmaps_offset' like with snapshots.
>>>>
>>>> Proposed command line syntax is the following:
>>>>
>>>> -dirty-bitmap [option1=val1][,option2=val2]...
>>>>       Available options are:
>>>>       name         The name for the bitmap (necessary).
>>>>
>>>>       file         The file to load the bitmap from.
>>>>
>>>>       file_id      When specified with 'file' option, then this file will
>>>>                    be available through this id for other -dirty-bitmap
>>>>                    options when specified without 'file' option, then it
>>>>                    is a reference to 'file', specified with another
>>>>                    -dirty-bitmap option, and it will be used to load the
>>>>                    bitmap from.
>>>>
>>>>       drive        The drive to bind the bitmap to. It should be
>>>> specified
>>>>                    as 'id' suboption of one of -drive options. If nor
>>>>                    'file' neither 'file_id' are specified, then the
>>>> bitmap
>>>>                    will be loaded from that drive (internal dirty
>>>> bitmap).
>>>>
>>>>       granularity  The granularity for the bitmap. Not necessary, the
>>>>                    default value may be used.
>>>>
>>>>       enabled      on|off. Default is 'on'. Disabled bitmaps are not
>>>>                    changing regardless of writes to corresponding drive.
>>>>
>>>> Examples:
>>>>
>>>> qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk
>>>> qemu -drive file=a.raw,id=disk \
>>>>        -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off
>>>>
>>>> Vladimir Sementsov-Ogievskiy (8):
>>>>     spec: add qcow2-dirty-bitmaps specification
>>>>     qcow2: add dirty-bitmaps feature
>>>>     block: store persistent dirty bitmaps
>>>>     block: add bdrv_load_dirty_bitmap
>>>>     qcow2: add qcow2_dirty_bitmap_delete_all
>>>>     qcow2: add autoclear bit for dirty bitmaps
>>>>     qemu: command line option for dirty bitmaps
>>>>     iotests: test internal persistent dirty bitmap
>>>>
>>>>    block.c                       |  82 +++++++
>>>>    block/Makefile.objs           |   2 +-
>>>>    block/qcow2-dirty-bitmap.c    | 537
>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>    block/qcow2.c                 |  69 +++++-
>>>>    block/qcow2.h                 |  61 +++++
>>>>    blockdev.c                    |  38 +++
>>>>    docs/specs/qcow2.txt          |  66 ++++++
>>>>    include/block/block.h         |   9 +
>>>>    include/block/block_int.h     |  10 +
>>>>    include/sysemu/blockdev.h     |   1 +
>>>>    include/sysemu/sysemu.h       |   1 +
>>>>    qemu-options.hx               |  37 +++
>>>>    tests/qemu-iotests/118        |  83 +++++++
>>>>    tests/qemu-iotests/118.out    |   5 +
>>>>    tests/qemu-iotests/group      |   1 +
>>>>    tests/qemu-iotests/iotests.py |   6 +
>>>>    vl.c                          | 100 ++++++++
>>>>    17 files changed, 1105 insertions(+), 3 deletions(-)
>>>>    create mode 100644 block/qcow2-dirty-bitmap.c
>>>>    create mode 100755 tests/qemu-iotests/118
>>>>    create mode 100644 tests/qemu-iotests/118.out
>>>>
>>> Well, you said "RFC" ... So here's some "C" that you RF'd.
>>>
>>> Many of these points are a "wish list" of sorts and don't necessarily
>>> have to be implemented all at once, but we should be careful to design
>>> the core series with the later additions in mind.
>>>
>>> Many of these items are things that I wouldn't mind working on
>>> (Primarily the QMP interfaces), provided that the core of this series
>>> will allow for them to exist. I can take many of the QMP/transaction
>>> interface projects, for instance.
>>>
>>> I'm starting to think we won't be able to squeeze this in for 2.4, but
>>> we can have a bulk of the work well underway for 2.5, by which point I
>>> am hopeful that libvirt will be beginning to pick up motion for
>>> integration of this feature.
>>>
>>> I think that the basic approach you have so far is good, we just have to
>>> plan out our required extensions and then we can review the base to make
>>> sure it supports the features we want in the near future.
>>>
>>>
>>> (1) General storage design
>>>
>>> - Persistence bitmaps can be stored in any arbitrary qcow2 file,
>>> regardless of if that qcow2 holds data or not.
>>>
>>> - Any given qcow2 file with or without data can hold bitmaps intended
>>> for any number of other drives.
>> Actually, dirty bitmap is not bound to the image, it just have a name,
>> identifying it. We can (try to) load any bitmap for any image.
>>
> I'm not sure what you mean by "bound" here, but yes, as it stands: the
> design is very flexible and I like that. It appears that bitmaps for any
> number of images can be simultaneously stored in a .qcow2 as a generic
> container, so it's a very flexible approach.
>
> I didn't mean to imply that you couldn't do that already, because it
> looks like you can.
>
>>> - Dirty bitmaps are not assumed to be able to be stored in any
>>> particular location.
>>>
>>> So far, this is good. I like the flexibility this provides. This lets us
>>> do all kinds of cool things like store bitmaps for 20 different raw
>>> drives inside of a single 'bitmaps.qcow2' if we wish.
>>>
>>>
>>> (2) Bitmaps added via QMP do not get any persistence attributes.
>>>
>>> This is something we'll need to change. Existing QMP commands that let
>>> us modify bitmaps:
>>>
>>> block-dirty-bitmap-add        [+transaction]
>>> block-dirty-bitmap-remove
>>> block-dirty-bitmap-clear    [+transaction]
>>>
>>> - block-dirty-bitmap-add:
>>>
>>> We will want the ability for bitmap-add to specify a persistence option.
>>> What I am less clear on is what this attribute should look like.
>>>
>>> should we add target: <filename> as an attribute here,
>>> or should it be target: <node> to specify the file object that we want
>>> to store this bitmap in? Or perhaps both?:
>>>
>>> mode: file, target: <filename>
>>> mode: node, target: <node>
>>>
>>> Or even an explicit usability feature that lets us specify that we wish
>>> to store the bitmap for the drive we're attaching it to:
>>>
>>> block-dirty-bitmap=add node=drive0 name=bitmap0 mode=self
>>>
>>> The implication here is that the default value for persist could be
>>> "none", which does not attempt to store this bitmap anywhere.
>>>
>>> - block-dirty-bitmap-remove
>>>
>>> If we remove a bitmap with persistence options active, it needs to be
>>> cleared out of the file it is being stored in. Currently we use
>>> "release" to remove a bitmap, which deletes only the in-memory portion
>>> of the bitmap, so you also use release in your series to delete
>>> in-memory bitmaps after we're done with them.
>>>
>>> I think the semantics of the "remove" QMP option here, however, should
>>> include a call to the storage layer to remove the bitmap in question.
>>>
>>> Let's split the "release" function into two functions:
>>> (A) bdrv_dirty_bitmap_free (which just frees the in-memory bits)
>>> (B) bdrv_dirty_bitmap_delete (which relies on _free but deletes from
>>> disk also.)
>>>
>>> Then bdrv_close can use bitmap_free, but the QMP remove command can
>>> utilize _delete.
>>>
>>> - block-dirty-bitmap-clear:
>>>
>>> This needs to clear the bitmap on-disk if it has persistence features
>>> active.
>> Does it? When the bitmap is loaded, its representation on disk is
>> inconsistent, and an in_use bit is set (on disk). So, we don't need to
>> sync it here.
>> Syncing on 'remove' is not necessary for the same reason, but may take
>> place to not store extra trash..
>>
> Keeping an in-use bit is definitely a way to accommodate this QMP
> command, so you're right, we don't need to pay special attention here --
> unless we go with some kind of a periodic flush model, at which point if
> the bitmap is "clean," this command will need to re-mark it as dirty.
>
> Just a generic ->mark_bitmap_dirty() op to call here would suffice entirely.
>
>>> - block-dirty-bitmap-copy:
>>>
>>> This is only a proposal currently, but worth us keeping it in mind. We
>>> should decide on copy semantics. Should the copy keep the persistence
>>> attributes of the source bitmap by default and allow a user to override
>>> it if desired, or should we force the persistence attribute back to
>>> null/None until the user overrides?
>>>
>>> I suspect defaulting it to no persistence is probably the sanest until
>>> we're told otherwise (either via an extension to the copy command or a
>>> later edit command.)
>>>
>>> Since the QMP interfaces has been my area so far, I can draft their
>>> addition as a new series if you'd like.
>> ok.
>>
>>>
>>> (3) Additional QMP interfaces
>>>
>>> We should add the ability to modify a bitmap's persistence after it has
>>> been added.
>>>
>>> block-dirty-bitmap-edit mode=<file,node,self,none> target=<...>
>>>
>>> This will allow us to add persistence to a bitmap after creation, or
>>> remove persistence from a bitmap without deleting it if it's no longer
>>> desired.
>>>
>>> Perhaps at a later date we could even have it change where the bitmap is
>>> stored through this mechanism.
>>>
>>> (Usability features might include the ability for us to rename or change
>>> the granularity of the bitmap, too -- but that's future usability stuff,
>>> not core functionality.)
>>>
>>> Like the above, I can draft this addition.
>> no objections)
>>
>>>
>>> (4) Storage Format
>>>
>>> I think overall the bitmap extension headers look sane, but Kevin is the
>>> ultimate authority here.
>>>
>>> I /would/ like to see an additional header bitfield reserved
>>> for some arbitrary flags that can be used at a later date. A uint32_t
>>> should be sufficient for now, with some of the upper bits reserved
>>> either for an extension or a version field to allow us to expand the
>>> bitmap headers in the future if necessary.
>> ok
>>
>>>
>>> (5) Bitmap autoloading
>>>
>>> Bitmaps are not currently automatically loaded if you pass e.g. (-hda
>>> my_drive_that_also_has_bitmaps.qcow2). This is in part because the drive
>>> a bitmap was intended for is not information stored with the bitmap, so
>>> QEMU has no concept or ability to be able to "auto load" bitmaps.
>>>
>>> Hinted at earlier by my desire to see something like mode=self, we
>>> should add some flags to the dirty bitmap header stored with each bitmap:
>>>
>>> 0x01: "This bitmap describes the file it is stored in"
>>> 0x02: "This bitmap should be auto-loaded when this file is opened."
>>> 0x04: "This bitmap is read-only (disabled.)"
>> The last one - should it be used only for auto-loading bitmaps?
>>
> Not necessarily, I just lumped it in here as an example to be grouped
> near the other flags I thought we needed. Maybe we don't actually need a
> read only flag to be stored because there's not currently a use-case for
> RO bitmaps outside of migration.
>
> Just a passing thought.
>
>>> This way, with a properly modern version of QEMU, you could simply just:
>>>
>>> qemu -M q35 -enable-kvm -hda windows10.qcow2
>>>
>>> and if there were bitmaps inside of windows10.qcow2 that had 0x01 and
>>> 0x02 set, you'd get those bitmaps loaded before any IO to the data
>>> clusters of the .qcow2, ensuring data integrity.
>>>
>>> Of course, I think that it is currently too complicated to try to
>>> accomplish autoloading of bitmaps for *other* drives, so let's not worry
>>> about that now. This means 0x02 set without 0x01 would be an error.
>>>
>>> Of course, when autoloading bitmaps, we'll have to check that the size
>>> of the bitmap matches the size of the drive. This is easy to do, though.
>> it is always checked)
>>
> You're right. I think I wasn't convinced we needed size (etc) to be part
> of the lookup process, but the way you have it now it does always check
> the sizes. Just thinking out loud again.
>
>>> The 0x01 bit can be set automatically when that circumstance is
>>> detected, and 0x02 can be set perhaps as an option to
>>> --dirty-bitmap auto=yes
>>> or via the QMP
>>> block-dirty-bitmap-add ... auto=yes
>>> or via the edit command,
>>> block-dirty-bitmap-edit ... auto=yes
>>>
>>> Maybe we could also set it implicitly if mode=self is used, too.
>> Also, for auto-loading bitmaps, user can manually load it (changing
>> 'disabled' bit). And in this case auto-loading should be skipped.
>> Also, if auto-loading is default behavior, than what about
>> --disable-bitmap-autoloading or something like this?
>>
> Agreed. Whatever the default is, we need a way to turn it off and be
> explicit about it. Perhaps as an argument to -drive?
>
> e.g.
>
> -drive if=none,file=linux-and-bitmaps.qcow2,bitmap=<auto,no>
>
> where "no" would be a very explicit "Do not load any of the bitmaps in
> this file."
>
> "auto" would load automatically any of the bitmaps stored there with the
> auto/self flags set, and skip the rest otherwise.
>
> Maybe this is serviceable.
>
>>> (6) qemu-img interface
>>>
>>> Stefan has mentioned that it would be nice to implement a query ability
>>> to qemu-img to list bitmaps stored in qcow2 files, along with some of
>>> their key attributes. size, granularity, any flags. It's probably not
>>> efficient to list the dirty count, unless we begin storing that
>>> information manually in the header. I don't think there's a strong need
>>> for that level of info, though.
>>>
>>> I can handle this part, if you'd like.
>> can qmp query block with information about bitmaps be reused here?
>>
> I don't think so. Here we'd be reading the bitmaps on-disk and reporting
> the info stored in-file, instead of the in-memory structures.
>
>>> (7) CLI interface
>>>
>>> - The only way to get a bitmap loaded into memory from file is to use
>>> the --dirty-bitmap argument where you specify the name, file,
>>> destination drive, and granularity.
>>>
>>> - The only way to create a new bitmap that will integrate with the
>>> persistence features is to specify a new bitmap that does not currently
>>> exist within a file and allow the qcow2 layer to create the in-memory
>>> bitmap for us.
>>>
>>> This helps us with the flexibility that makes this design a winning
>>> choice overall, but it's cumbersome for some special common cases I
>>> think we should be supporting.
>>>
>>> As mentioned previously, I think granularity should not be part
>>> of the lookup process -- just creation, and even then I think this CLI
>>> syntax should not automatically create bitmaps if it wasn't found -- if
>>> the user didn't intend to make a bitmap, an error is likely more
>>> appropriate.
>>>
>>> Perhaps --dirty-bitmap create=true,[...] would be sufficient for
>>> specifying intent here, at which point granularity makes sense for the
>>> creation process.
>>>
>>> As for the granularity, I think this should be appropriate:
>>>
>>> --dirty-bitmap file=bitmaps.qcow2,name=bitmap0,drive=drive0
>>>
>>> And that should be sufficient to look in bitmaps.qcow2, find 'bitmap0',
>>> and attach it to 'drive0', throwing an error if the sizes don't match.
>> agree, I will do it
>>
> Great! I promise I do like the series overall even if I had a
> book-length comment about it :)
>
>>> (8) Namespaces
>>>
>>> Stefan also asked me about the bitmap namespaces -- in-memory of course,
>>> each node can have their own "bitmap0" without any collisions because
>>> all bitmaps are always referred to by their (bs,name) pair.
>>>
>>> How do we address bitmaps inside a file, though?
>>>
>>> If any given bitmap containing .qcow2 file can store an arbitrary number
>>> of bitmaps intended for an arbitrary number of destinations, how do we
>>> handle this?
>>>
>>> EXAMPLE:
>>> -dirty-bitmap name=bitmap0,drive=drive0,file=bitmaps.qcow2
>>> -dirty-bitmap name=bitmap0,drive=drive1,file=bitmaps.qcow2
>>>
>>> I think this might currently do very funky things, if bitmaps.qcow2 is
>>> currently empty -- I think both calls will succeed, but it will fail
>>> later when it tries to store them and cannot.
>>>
>>> I think we need to do one of two things:
>>>
>>> (A) Keep the namespace inside of a .qcow2 file as it is now, but ALWAYS
>>> check up front if a bitmap *can* be added to the file. This way we don't
>>> run into problems after we've dirtied the bitmap.
>>>
> To clarify, I meant "every bitmap name inside of a file is unique. Check
> to make sure it is possible to store a new bitmap upon its creation."
>
>>> (B) Find a way to accommodate bitmaps with the same names that were
>>> intended for different nodes.
>>>
>>> I don't have a good idea for #2, so I think #1 is probably the way to
>>> go. We can amend the bitmap documentation to specify that although the
>>> bitmap names are unique per-node, if you want to store them in the same
>>> file, you're going to want to give them globally unique names.
>> A: And what about the case: several raw disks and bitmaps.qcow2? In this
>> case using of namespaces is impossible. Or we are going to have
>> *-bitmap.qcow2 for each disk..
>>
> We could continue letting users do drive0 bitmap0 and drive1 bitmap0,
> but as soon as they try to use the QMP commands to store those bitmaps,
> the QMP command will report an error if the bitmaps.qcow2 already has a
> "bitmap0."
>
> It effectively uses a per-file namespace for bitmaps and applies that
> restriction to any in-memory bitmaps created with persistence flags.
>
>> B: As I understand, we have no id or name for the image, it comes from
>> cmd line.. So we can't use node name as namespace name. Why not just add
>> namespaces?
>>
>> -drive file=a.raw,id=disk1,dirty-bitmaps-namespace=disk1_ns \
>> -drive file=b.raw,id=disk2,dirty-bitmaps-namespace=disk2_ns \
>>   -dirty-bitmap name=bitmap0,drive=disk1,file=bitmaps.qcow2
>>   -dirty-bitmap name=bitmap1,drive=disk1,file=bitmaps.qcow2
>>   -dirty-bitmap name=bitmap0,drive=disk2,file=bitmaps.qcow2
>>
>>
>> Default namespace: empty string or node name?
>> Namespace name should be stored in bitmap header for each bitmap.. As
>> separate field with length field, or may be as bitmap name part
>> (separated from it by '#' character for example)
>>
> I think that's starting to get a little too manual and verbose on the
> CLI at this point. Maybe we really should just enforce the first option
> and call it a day.
>
>>>
>>> (9) Data consistency
>>>
>>> We need to discuss the data safety element to this. I think that
>>> atomically before the first write is flushed to disk, the dirty bitmap
>>> needs to *at least* set a bit in the bitmap header that indicates that
>>> the bitmap is no longer up-to-date.
>>>
>>> When the bitmap is later flushed to disk, that bit can be cleared until
>>> the next write occurs, which repeats the process.
>>>
>>> We have discussed this (long ago) in the past, but one of the ideas was
>>> to monitor the relative utilization rate of the disk and attempt to
>>> flush the bitmap whenever there was a lull in disk IO, then clear the
>>> "inconsistent" bit.
>>>
>>> On close, the flush of data and bitmap both would lead us to clear this
>>> bit as well.
>>>
>>> Upon boot, if the inconsistent bit was set, we'd know that the bitmap
>>> was outdated and we'd have to recommend that the bitmap be cleared and a
>>> new bitmap started.
>>>
>>> (Or, perhaps, a data-intensive mode where we compare the current data
>>> mode with the most recent incremental backup to re-determine what data
>>> has changed. This would be very, very slow but an option at least for
>>> recovery if started a new full backup is even less desirable.)
>>>
>>> Other ideas involve regularly flushing the bitmap at certain timed
>>> intervals, certain usage intervals (e.g. when the changed bitmap data
>>> reaches some total size, like 64KiB of changed bits), or a combination
>>> of regular intervals with "opportunistic" flushing during Disk IO lulls.
>>>
>>> This is a key feature that absolutely needs to make it into the base
>>> series, IMO.
>> I don't understand, what the use of flushing bitmap not only on
>> disk:close? If there no failures with disk, than bitmap will be flushed
>> on close and will be consistent for next open(). If there is a disk
>> crash, even if we flush the bitmap regularly, what is the possibility of
>> crashing immediately after last flush, before further io-s?
>>
> The usage case is QEMU crash, power failure, etc. Not disk crash. If we
> periodically flush to HD, we increase the chances that we don't corrupt
> our image and bitmap.
>
> If we NEVER flush, we guarantee that any segfault or power outage will
> absolutely trash our data.
>
>>> (10) Storage Efficiency
>>>
>>> We should discuss the usage of meta bitmaps or ancillary bitmaps to
>>> record which parts of our bitmap data need to be flushed to disk in
>>> order to reduce flush/close time.
>>>
>>> The current meta bitmap implementation optimizes for 1KiB writes to the
>>> network (which fits well under the standard 1500bytes), but perhaps we
>>> could optimize for local storage block size and use this to be stingy
>>> about how much data we decide to write to disk.
>>>
>>> I believe this is another feature that should be included in the initial
>>> series as well, because it might radically impact the core design.
>> ok
>>
> Yeah, we might just end up having meta_bitmaps on all the time and rely
> on them to know what remains to be written to disk. I think Stefan
> wasn't too keen on the idea of a 512GiB disk needing to write a solid
> 1MiB of data on every close, when in practice we might be able to reduce
> it to just a handful of block writes.
>
>>> (11) Migration
>>>
>>> Stefan already touched on this, but we should be mindful of the
>>> different kinds of migration scenarios.
>>>
>>> We might migrate the disks, or they might be shared already.
>>>
>>> We might migrate (or share) a disk, but what happens if we didn't
>>> migrate or didn't share the bitmap storage file that we were using?
>>>
>>> Bitmaps without persistence data will migrate just fine, but how do we
>>> intend to migrate the persistence data itself? I suppose as a first pass
>>> we can just tap into the migration calls and migrate some properties
>>> like:
>>>
>>> "This bitmap relies on node_id=xxxx to save its bitmap"
>>>
>>> and that should probably work for either kind of storage migration
>>> tactic. The only problem would be nodes without IDs that we opened by
>>> filename ...
>> It looks like some bitmaps may be migrated automatically (i.e. created
>> on destination, if they don't exist), but others don't. This means, that
>> user should describe bitmaps in destination cmd, at least bitmaps,
>> loaded from file, not node name. And in this case, migration of
>> persistent bitmap will success if there is a bitmap on destination for
>> the same node, with the same name and granularity and with set 'file'
>> field. Otherwise migration fails..
>>
> I think I still need to think about this one for a little bit, but I
> think there's other work we can do in the meantime at least.
>
>>> ...Another technique would be for any bitmap that is persistent is to
>>> store them all first prior to migration and then allow the destination
>>> to load them anew. This would also work for either shared or migrated
>>> storage if we worked it right.
>>>
>>> It seems a little hairy, and I don't have the answers right now...
>>> Something I will ponder on the weekend.
>>
> Thanks!
Sorry for long delay with next version. And unfortunately it will be 
longer, because now I'm very busy with other work and I'll be on 
vocation from July 15 till August 3.

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-07-08 12:24       ` Vladimir Sementsov-Ogievskiy
@ 2015-07-08 15:21         ` John Snow
  0 siblings, 0 replies; 76+ messages in thread
From: John Snow @ 2015-07-08 15:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, pbonzini, Fam Zheng, stefanha, den



On 07/08/2015 08:24 AM, Vladimir Sementsov-Ogievskiy wrote:
> Sorry for long delay with next version. And unfortunately it will be
> longer, because now I'm very busy with other work and I'll be on
> vocation from July 15 till August 3.

That's fine!

On my end, I'm still aware of your migration series (which is fine, as
far as I know, but I'll push for inclusion again once we've made more
progress on the persistence.)

and now we're in 2.4 hard freeze, so it's okay.

I'll work on a QMP interface prototype to work with your v2 here in
anticipation of a v3.

Enjoy your vacation!

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-10 14:30   ` Stefan Hajnoczi
  2015-06-12 19:02     ` John Snow
@ 2015-08-14 17:14     ` Vladimir Sementsov-Ogievskiy
  2015-08-26  9:09       ` Stefan Hajnoczi
  1 sibling, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-14 17:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, pbonzini, den,
	jsnow

On 10.06.2015 17:30, Stefan Hajnoczi wrote:
> On Mon, Jun 08, 2015 at 06:21:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>
>
>
>> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
>> +                     bm->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    buf = g_malloc0(bm->l1_size * s->cluster_size);
> What is the maximum l1_size value?  cluster_size and l1_size are 32-bit
> so with 64 KB cluster_size this overflows if l1_size > 65535.  Do you
> want to cast to size_t?

Hmm. What the maximum RAM space we'd like to spend on dirty bitmap? I 
think 4Gb is too much.. So here should be limited not the l1_size but 
number of bytes needed to store the bitmap. What is maximum disk size we 
are dealing with?


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-10 15:34   ` Kevin Wolf
  2015-06-11 10:25     ` Vladimir Sementsov-Ogievskiy
@ 2015-08-24 10:46     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-24 10:46 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

On 10.06.2015 18:34, Kevin Wolf wrote:
> Am 08.06.2015 um 17:21 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>
>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>> other drives (there may be qcow2 file with zero disk size but with
>> several dirty bitmaps for other drives).
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>> index 121dfc8..0fffba2 100644
>> --- a/docs/specs/qcow2.txt
>> +++ b/docs/specs/qcow2.txt
>> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
>>                           0x00000000 - End of the header extension area
>>                           0xE2792ACA - Backing file format name
>>                           0x6803f857 - Feature name table
>> +                        0x23852875 - Dirty bitmaps
>>                           other      - Unknown header extension, can be safely
>>                                        ignored
>>   
>> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>>                       terminated if it has full length)
>>   
>>   
>> +== Dirty bitmaps ==
>> +
>> +Dirty bitmaps is an optional header extension. It provides a possibility of
>> +storing dirty bitmaps in qcow2 image. The fields are:
>> +
>> +          0 -  3:  nb_dirty_bitmaps
>> +                   Number of dirty bitmaps contained in the image
>> +
>> +          4 - 11:  dirty_bitmaps_offset
>> +                   Offset into the image file at which the dirty bitmaps table
>> +                   starts. Must be aligned to a cluster boundary.
>> +
>> +
>>   == Host cluster management ==
> You need to use a compatibility flag because for old qemu versions, the
> dirty bitmaps (and associated metadata) are leaked clusters and qemu-img
> check would "repair" them by resetting the refcount to 0.
>
> At second sight, I see that your patches add an autoclear flag.
> Presumably the contents of the dirty bitmaps is outdated when you
> accessed the image with an older version, so this seems right. We just
> need to document it.
>
>>   qcow2 manages the allocation of host clusters by maintaining a reference count
>> @@ -360,3 +374,55 @@ Snapshot table entry:
>>   
>>           variable:   Padding to round up the snapshot table entry size to the
>>                       next multiple of 8.
>> +
>> +
>> +== Dirty bitmaps ==
>> +
>> +The feature supports storing several dirty bitmaps in the qcow2 file.
>> +
>> +=== Cluster mapping ===
>> +
>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>> +bitmaps to host clusters. There is only an L1 table.
>> +
>> +The L1 table has a variable size (stored in the Bitmap table entry) and may
>> +use multiple clusters, however it must be contiguous in the image file.
>> +
>> +Given an offset into the bitmap, the offset into the image file can be
>> +obtained as follows:
>> +
>> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
>> +
>> +L1 table entry:
>> +
>> +    Bit  0 -  61:   Standard cluster descriptor
>> +
>> +        62 -  63:   Reserved
> Stefan already mentioned that we don't have a "L1" when there is only
> one level, and that you shouldn't reuse the cluster descriptors from L2
> tables.
>
>> +=== Bitmap table ===
>> +
>> +A directory of all bitmaps is stored in the bitmap table, a contiguous area in
>> +the image file, whose starting offset and length are given by the header fields
>> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
>> +variable length, depending on the length of name and extra data.
>> +
>> +Bitmap table entry:
>> +
>> +    Byte 0 -  7:    Offset into the image file at which the L1 table for the
>> +                    bitmap starts. Must be aligned to a cluster boundary.
>> +
>> +         8 - 11:    Number of entries in the L1 table of the bitmap
> Worth using 64 bits here? This can only cover 4 * 512 GB = 2 TB for the
> smallest possible cluster size. Though it's 65536 * 512 = 32 PB for the
> default, which might be enough for a while.

We store the bitmap in RAM.. I think 2 TB bitmap should not appear, 
larger granularity should be used for big disks.

>
>> +        12 - 15:    Bitmap granularity in bytes
>> +
>> +        16 - 23:    Bitmap size in sectors
> Please don't use sectors, that's a meaningless unit. Bytes is better.
>
>> +        24 - 25:    Size of the bitmap name
> We should use a smaller limit than the possible 64k to avoid too large
> memory allocations. Nobody needs really long bitmap names.
>
>> +
>> +        variable:   The name of the bitmap (not null terminated)
>> +
>> +        variable:   Padding to round up the bitmap table entry size to the
>> +                    next multiple of 8.
> Kevin


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
                     ` (2 preceding siblings ...)
  2015-06-10 15:34   ` Kevin Wolf
@ 2015-08-24 13:30   ` Vladimir Sementsov-Ogievskiy
  2015-08-24 14:08     ` Vladimir Sementsov-Ogievskiy
  2015-08-24 14:04   ` Vladimir Sementsov-Ogievskiy
  2015-08-31 22:21   ` Eric Blake
  5 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-24 13:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

About structs and constraints:

Optional Header:

64bit nb_dirty_bitmaps
     valid: 1 - 65536. I think here should not be 0, in this case 
dirty-bitmap-optional-header should not exist at all. Should it instead 
be 0 - 65536
64bit dirty_bitmaps_offset
     valid: any, but dirty_bitmaps_offset % cluster_size = 0

Dirty BItmap Directory Enrty ( = bitmap header):

64bit dirty_bitmap_table_offset
     valid: any, but dirty_bitmaps_offset % cluster_size = 0
64bit nb_virtual_bits (before it was called bitmap_size)
     valid: no direct constraints (as for disk size), but it should be 
less then dirty_bitmap_table_size * cluster_size * 8 * bitmap_granularity
32bit dirty_bitmap_table_size
     ? The bitmap will take ~ dirty_bitmap_table_size * cluster_size 
bytes in RAM. What the limit should be for it?
32bit bitmap_granularity_bits ( before it was bitmap_granularity)
     valid; 0 - 63 (as for HBitmap)
     (1 << bitmap_granularity_bits) is number of virtual bits in one 
physical bit. Not related to sectors/bytes, etc. Let this format be 
closer to HBitmap than to BdrvDirtyBitmap
16bit name_size
     valid: 1 - 1023. // should it be 0 - 1023 ?
/* name follows */
/* offset to 8 bytes boundary follows */

On 08.06.2015 18:21, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>
> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
> other drives (there may be qcow2 file with zero disk size but with
> several dirty bitmaps for other drives).
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 66 insertions(+)
>
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 121dfc8..0fffba2 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
>                           0x00000000 - End of the header extension area
>                           0xE2792ACA - Backing file format name
>                           0x6803f857 - Feature name table
> +                        0x23852875 - Dirty bitmaps
>                           other      - Unknown header extension, can be safely
>                                        ignored
>   
> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>                       terminated if it has full length)
>   
>   
> +== Dirty bitmaps ==
> +
> +Dirty bitmaps is an optional header extension. It provides a possibility of
> +storing dirty bitmaps in qcow2 image. The fields are:
> +
> +          0 -  3:  nb_dirty_bitmaps
> +                   Number of dirty bitmaps contained in the image
> +
> +          4 - 11:  dirty_bitmaps_offset
> +                   Offset into the image file at which the dirty bitmaps table
> +                   starts. Must be aligned to a cluster boundary.
> +
> +
>   == Host cluster management ==
>   
>   qcow2 manages the allocation of host clusters by maintaining a reference count
> @@ -360,3 +374,55 @@ Snapshot table entry:
>   
>           variable:   Padding to round up the snapshot table entry size to the
>                       next multiple of 8.
> +
> +
> +== Dirty bitmaps ==
> +
> +The feature supports storing several dirty bitmaps in the qcow2 file.
> +
> +=== Cluster mapping ===
> +
> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
> +bitmaps to host clusters. There is only an L1 table.
> +
> +The L1 table has a variable size (stored in the Bitmap table entry) and may
> +use multiple clusters, however it must be contiguous in the image file.
> +
> +Given an offset into the bitmap, the offset into the image file can be
> +obtained as follows:
> +
> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
> +
> +L1 table entry:
> +
> +    Bit  0 -  61:   Standard cluster descriptor
> +
> +        62 -  63:   Reserved
> +
> +=== Bitmap table ===
> +
> +A directory of all bitmaps is stored in the bitmap table, a contiguous area in
> +the image file, whose starting offset and length are given by the header fields
> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
> +variable length, depending on the length of name and extra data.
> +
> +Bitmap table entry:
> +
> +    Byte 0 -  7:    Offset into the image file at which the L1 table for the
> +                    bitmap starts. Must be aligned to a cluster boundary.
> +
> +         8 - 11:    Number of entries in the L1 table of the bitmap
> +
> +        12 - 15:    Bitmap granularity in bytes
> +
> +        16 - 23:    Bitmap size in sectors
> +
> +        24 - 25:    Size of the bitmap name
> +
> +        variable:   The name of the bitmap (not null terminated)
> +
> +        variable:   Padding to round up the bitmap table entry size to the
> +                    next multiple of 8.
> +
> +The fields "size", "granularity" and "name" are corresponding with the fields
> +in struct BdrvDirtyBitmap.


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
                     ` (3 preceding siblings ...)
  2015-08-24 13:30   ` Vladimir Sementsov-Ogievskiy
@ 2015-08-24 14:04   ` Vladimir Sementsov-Ogievskiy
  2015-08-31 22:21   ` Eric Blake
  5 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-24 14:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

About structs and constraints:

== Optional Header ==

64bit nb_dirty_bitmaps
     valid: 1 - 65536. I think here should not be 0, in this case 
dirty-bitmap-optional-header should not exist at all. Should it instead 
be 0 - 65536

64bit dirty_bitmaps_offset
     valid: any, but dirty_bitmaps_offset % cluster_size = 0

== Dirty BItmap Directory Enrty ( = bitmap header) ==

64bit dirty_bitmap_table_offset
     valid: any, but dirty_bitmap_table_offset % cluster_size = 0

64bit nb_virtual_bits (before it was called bitmap_size)
     valid: no direct constraints (as for disk size), but it should be 
<= dirty_bitmap_table_size * cluster_size * 8 * bitmap_granularity

32bit dirty_bitmap_table_size
     ? The bitmap will take ~ dirty_bitmap_table_size * cluster_size 
bytes in RAM. What the limit should be for it?
     for bdrv dirty bitmaps
           max disk size = QCOW_MAX_L1_SIZE / 8 * (cluster_size / 8) * 
cluster_size = 524288 * cluster_size ^ 2,
           dirty bitmap covers the following size: 
dirty_bitmap_table_size * cluster_size * 8 * byte_granularity
           therefore, to cover the whole disk,
           dirty_bitmap_table_size * cluster_size * 8 * byte_granularity 
 >= 524288 * cluster_size ^ 2
           i.e. dirty_bitmap_table_size >= 65536 * cluster_size / 
byte_granularity
           max cluster size is 0x200000 ( = 1 << 21 ), min granularity = 
sector_size = 512, so
           dirty_bitmap_table_size >= 0x10000000, and this limit should 
be enough for any current disk configuration, but, with cluster of size 
1 << 21
           and such dirty_bitmap_table_size we will have tooo huge 
bitmap in RAM and it is unreal..

32bit bitmap_granularity_bits ( before it was bitmap_granularity)
     valid; 0 - 63 (as for HBitmap)
     (1 << bitmap_granularity_bits) is number of virtual bits in one 
physical bit. Not related to sectors/bytes, etc. Let this format be 
closer to HBitmap than to BdrvDirtyBitmap

16bit name_size
     valid: 1 - 1023. // should it be 0 - 1023 ?

/* name follows */
/* offset to 8 bytes boundary follows */

==About granularity==
To make things more general, granularity is not related to 
disk/bytes/sectors. Granularity is just number of virtual bits in one 
physical bit.

Then, for bdrv dirty bitmaps (let's assume, that it may be not only one 
type of store dirty bitmaps), let establish, that one virtual bit is 
corresponding to one sector of the disk, i.e., byte granularity would be:
byte_granularity = (1 << bitmap_granularity_bits) * 512
for ex:
65536 = (1 << 7) * 512

On 08.06.2015 18:21, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>
> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
> other drives (there may be qcow2 file with zero disk size but with
> several dirty bitmaps for other drives).
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 66 insertions(+)
>
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index 121dfc8..0fffba2 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt
> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like the following:
>                           0x00000000 - End of the header extension area
>                           0xE2792ACA - Backing file format name
>                           0x6803f857 - Feature name table
> +                        0x23852875 - Dirty bitmaps
>                           other      - Unknown header extension, can be safely
>                                        ignored
>   
> @@ -166,6 +167,19 @@ the header extension data. Each entry look like this:
>                       terminated if it has full length)
>   
>   
> +== Dirty bitmaps ==
> +
> +Dirty bitmaps is an optional header extension. It provides a possibility of
> +storing dirty bitmaps in qcow2 image. The fields are:
> +
> +          0 -  3:  nb_dirty_bitmaps
> +                   Number of dirty bitmaps contained in the image
> +
> +          4 - 11:  dirty_bitmaps_offset
> +                   Offset into the image file at which the dirty bitmaps table
> +                   starts. Must be aligned to a cluster boundary.
> +
> +
>   == Host cluster management ==
>   
>   qcow2 manages the allocation of host clusters by maintaining a reference count
> @@ -360,3 +374,55 @@ Snapshot table entry:
>   
>           variable:   Padding to round up the snapshot table entry size to the
>                       next multiple of 8.
> +
> +
> +== Dirty bitmaps ==
> +
> +The feature supports storing several dirty bitmaps in the qcow2 file.
> +
> +=== Cluster mapping ===
> +
> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
> +bitmaps to host clusters. There is only an L1 table.
> +
> +The L1 table has a variable size (stored in the Bitmap table entry) and may
> +use multiple clusters, however it must be contiguous in the image file.
> +
> +Given an offset into the bitmap, the offset into the image file can be
> +obtained as follows:
> +
> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
> +
> +L1 table entry:
> +
> +    Bit  0 -  61:   Standard cluster descriptor
> +
> +        62 -  63:   Reserved
> +
> +=== Bitmap table ===
> +
> +A directory of all bitmaps is stored in the bitmap table, a contiguous area in
> +the image file, whose starting offset and length are given by the header fields
> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap table have
> +variable length, depending on the length of name and extra data.
> +
> +Bitmap table entry:
> +
> +    Byte 0 -  7:    Offset into the image file at which the L1 table for the
> +                    bitmap starts. Must be aligned to a cluster boundary.
> +
> +         8 - 11:    Number of entries in the L1 table of the bitmap
> +
> +        12 - 15:    Bitmap granularity in bytes
> +
> +        16 - 23:    Bitmap size in sectors
> +
> +        24 - 25:    Size of the bitmap name
> +
> +        variable:   The name of the bitmap (not null terminated)
> +
> +        variable:   Padding to round up the bitmap table entry size to the
> +                    next multiple of 8.
> +
> +The fields "size", "granularity" and "name" are corresponding with the fields
> +in struct BdrvDirtyBitmap.


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-08-24 13:30   ` Vladimir Sementsov-Ogievskiy
@ 2015-08-24 14:08     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-24 14:08 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

Sorry, drop this if you, look at the new version of this litter
On 24.08.2015 16:30, Vladimir Sementsov-Ogievskiy wrote:
> About structs and constraints:
>
> Optional Header:
>
> 64bit nb_dirty_bitmaps
>     valid: 1 - 65536. I think here should not be 0, in this case 
> dirty-bitmap-optional-header should not exist at all. Should it 
> instead be 0 - 65536
> 64bit dirty_bitmaps_offset
>     valid: any, but dirty_bitmaps_offset % cluster_size = 0
>
> Dirty BItmap Directory Enrty ( = bitmap header):
>
> 64bit dirty_bitmap_table_offset
>     valid: any, but dirty_bitmaps_offset % cluster_size = 0
> 64bit nb_virtual_bits (before it was called bitmap_size)
>     valid: no direct constraints (as for disk size), but it should be 
> less then dirty_bitmap_table_size * cluster_size * 8 * bitmap_granularity
> 32bit dirty_bitmap_table_size
>     ? The bitmap will take ~ dirty_bitmap_table_size * cluster_size 
> bytes in RAM. What the limit should be for it?
> 32bit bitmap_granularity_bits ( before it was bitmap_granularity)
>     valid; 0 - 63 (as for HBitmap)
>     (1 << bitmap_granularity_bits) is number of virtual bits in one 
> physical bit. Not related to sectors/bytes, etc. Let this format be 
> closer to HBitmap than to BdrvDirtyBitmap
> 16bit name_size
>     valid: 1 - 1023. // should it be 0 - 1023 ?
> /* name follows */
> /* offset to 8 bytes boundary follows */
>
> On 08.06.2015 18:21, Vladimir Sementsov-Ogievskiy wrote:
>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>
>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>> other drives (there may be qcow2 file with zero disk size but with
>> several dirty bitmaps for other drives).
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   docs/specs/qcow2.txt | 66 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>> index 121dfc8..0fffba2 100644
>> --- a/docs/specs/qcow2.txt
>> +++ b/docs/specs/qcow2.txt
>> @@ -123,6 +123,7 @@ be stored. Each extension has a structure like 
>> the following:
>>                           0x00000000 - End of the header extension area
>>                           0xE2792ACA - Backing file format name
>>                           0x6803f857 - Feature name table
>> +                        0x23852875 - Dirty bitmaps
>>                           other      - Unknown header extension, can 
>> be safely
>>                                        ignored
>>   @@ -166,6 +167,19 @@ the header extension data. Each entry look 
>> like this:
>>                       terminated if it has full length)
>>     +== Dirty bitmaps ==
>> +
>> +Dirty bitmaps is an optional header extension. It provides a 
>> possibility of
>> +storing dirty bitmaps in qcow2 image. The fields are:
>> +
>> +          0 -  3:  nb_dirty_bitmaps
>> +                   Number of dirty bitmaps contained in the image
>> +
>> +          4 - 11:  dirty_bitmaps_offset
>> +                   Offset into the image file at which the dirty 
>> bitmaps table
>> +                   starts. Must be aligned to a cluster boundary.
>> +
>> +
>>   == Host cluster management ==
>>     qcow2 manages the allocation of host clusters by maintaining a 
>> reference count
>> @@ -360,3 +374,55 @@ Snapshot table entry:
>>             variable:   Padding to round up the snapshot table entry 
>> size to the
>>                       next multiple of 8.
>> +
>> +
>> +== Dirty bitmaps ==
>> +
>> +The feature supports storing several dirty bitmaps in the qcow2 file.
>> +
>> +=== Cluster mapping ===
>> +
>> +Dirty bitmaps are stored using a ONE-level structure for the mapping of
>> +bitmaps to host clusters. There is only an L1 table.
>> +
>> +The L1 table has a variable size (stored in the Bitmap table entry) 
>> and may
>> +use multiple clusters, however it must be contiguous in the image file.
>> +
>> +Given an offset into the bitmap, the offset into the image file can be
>> +obtained as follows:
>> +
>> +    offset = l1_table[offset / cluster_size] + (offset % cluster_size)
>> +
>> +L1 table entry:
>> +
>> +    Bit  0 -  61:   Standard cluster descriptor
>> +
>> +        62 -  63:   Reserved
>> +
>> +=== Bitmap table ===
>> +
>> +A directory of all bitmaps is stored in the bitmap table, a 
>> contiguous area in
>> +the image file, whose starting offset and length are given by the 
>> header fields
>> +dirty_bitmaps_offset and nb_dirty_bitmaps. The entries of the bitmap 
>> table have
>> +variable length, depending on the length of name and extra data.
>> +
>> +Bitmap table entry:
>> +
>> +    Byte 0 -  7:    Offset into the image file at which the L1 table 
>> for the
>> +                    bitmap starts. Must be aligned to a cluster 
>> boundary.
>> +
>> +         8 - 11:    Number of entries in the L1 table of the bitmap
>> +
>> +        12 - 15:    Bitmap granularity in bytes
>> +
>> +        16 - 23:    Bitmap size in sectors
>> +
>> +        24 - 25:    Size of the bitmap name
>> +
>> +        variable:   The name of the bitmap (not null terminated)
>> +
>> +        variable:   Padding to round up the bitmap table entry size 
>> to the
>> +                    next multiple of 8.
>> +
>> +The fields "size", "granularity" and "name" are corresponding with 
>> the fields
>> +in struct BdrvDirtyBitmap.
>
>


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-12 10:36     ` Stefan Hajnoczi
@ 2015-08-26  6:26       ` Vladimir Sementsov-Ogievskiy
  2015-08-26  9:13         ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-26  6:26 UTC (permalink / raw)
  To: Stefan Hajnoczi, Denis V. Lunev; +Cc: kwolf, jsnow, qemu-devel, pbonzini

On 12.06.2015 13:36, Stefan Hajnoczi wrote:
> On Fri, Jun 12, 2015 at 12:58:35PM +0300, Denis V. Lunev wrote:
>> On 11/06/15 23:06, Stefan Hajnoczi wrote:
>>> The load/store API is not scalable when bitmaps are 1 MB or larger.
>>>
>>> For example, a 500 GB disk image with 64 KB granularity requires a 1 MB
>>> bitmap.  If a guest has several disk images of this size, then multiple
>>> megabytes must be read to start the guest and written out to shut down
>>> the guest.
>>>
>>> By comparison, the L1 table for the 500 GB disk image is less than 8 KB.
>>>
>>> I think something like qcow2-cache.c or metabitmaps should be used to
>>> lazily read/write persistent bitmaps.  That way only small portions need
>>> to be read/written at a time.
>>>
>>> Stefan
>> for the first iteration we could open the image, start tracking,
>> read bitmap as one entity in the background and or read
>> and collected data.
>>
>> partial read could be done in the next step
> Making bitmap load/store fully lazy will require changes to the
> load/store API, so it's worth thinking about a little upfront.
> Otherwise there will be a lot of code churn when the fully lazy patches
> are posted.  As a reviewer it's in my interest to only spend time
> reviewing the final version instead of code that gets thrown out :-),
> but I understand.
>
> If you can make the read lazy to some extent that's a good start.
That way we can improve load performance, but what about store?

I see two solutions:
1) meta bitmaps (already mentioned)
2) Always (optionally?) have two bitmaps instead one: backing, which 
should be equal to the bitmap, already stored to the image, and active 
delta. This can be used instead of meta bitmaps in migration too.

difference:
with meta bitmaps we have double time overhead for writing to the bitmap 
(which is more often operations as I think),
with second approach we have double overhead for read from the bitmap 
(but for backup, we can *or* these two bitmaps once, and it can be done 
fast, using the power of HBitmap). And of course double ram overhead..

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-08-14 17:14     ` Vladimir Sementsov-Ogievskiy
@ 2015-08-26  9:09       ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-08-26  9:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, pbonzini, den,
	jsnow

On Fri, Aug 14, 2015 at 08:14:46PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 10.06.2015 17:30, Stefan Hajnoczi wrote:
> >On Mon, Jun 08, 2015 at 06:21:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> >>+    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> >>+                     bm->l1_size * sizeof(uint64_t));
> >>+    if (ret < 0) {
> >>+        goto fail;
> >>+    }
> >>+
> >>+    buf = g_malloc0(bm->l1_size * s->cluster_size);
> >What is the maximum l1_size value?  cluster_size and l1_size are 32-bit
> >so with 64 KB cluster_size this overflows if l1_size > 65535.  Do you
> >want to cast to size_t?
> 
> Hmm. What the maximum RAM space we'd like to spend on dirty bitmap? I think
> 4Gb is too much.. So here should be limited not the l1_size but number of
> bytes needed to store the bitmap. What is maximum disk size we are dealing
> with?

Modern file systems support up to exa- (XFS) or zetta- (ZFS) byte size.
If the disk image size is large, then the cluster size will probably
also be set larger than 64 KB (e.g. 1 MB).

Anyway, with 64 KB cluster size & bitmap granularity a 128 MB dirty
bitmap covers a 64 TB disk image.  So how about 256 MB or 512 MB max
dirty bitmap size?

Stefan

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-08-26  6:26       ` Vladimir Sementsov-Ogievskiy
@ 2015-08-26  9:13         ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2015-08-26  9:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: kwolf, qemu-devel, pbonzini, Denis V. Lunev, jsnow

On Wed, Aug 26, 2015 at 09:26:20AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 12.06.2015 13:36, Stefan Hajnoczi wrote:
> >On Fri, Jun 12, 2015 at 12:58:35PM +0300, Denis V. Lunev wrote:
> >>On 11/06/15 23:06, Stefan Hajnoczi wrote:
> >>>The load/store API is not scalable when bitmaps are 1 MB or larger.
> >>>
> >>>For example, a 500 GB disk image with 64 KB granularity requires a 1 MB
> >>>bitmap.  If a guest has several disk images of this size, then multiple
> >>>megabytes must be read to start the guest and written out to shut down
> >>>the guest.
> >>>
> >>>By comparison, the L1 table for the 500 GB disk image is less than 8 KB.
> >>>
> >>>I think something like qcow2-cache.c or metabitmaps should be used to
> >>>lazily read/write persistent bitmaps.  That way only small portions need
> >>>to be read/written at a time.
> >>>
> >>>Stefan
> >>for the first iteration we could open the image, start tracking,
> >>read bitmap as one entity in the background and or read
> >>and collected data.
> >>
> >>partial read could be done in the next step
> >Making bitmap load/store fully lazy will require changes to the
> >load/store API, so it's worth thinking about a little upfront.
> >Otherwise there will be a lot of code churn when the fully lazy patches
> >are posted.  As a reviewer it's in my interest to only spend time
> >reviewing the final version instead of code that gets thrown out :-),
> >but I understand.
> >
> >If you can make the read lazy to some extent that's a good start.
> That way we can improve load performance, but what about store?
> 
> I see two solutions:
> 1) meta bitmaps (already mentioned)
> 2) Always (optionally?) have two bitmaps instead one: backing, which should
> be equal to the bitmap, already stored to the image, and active delta. This
> can be used instead of meta bitmaps in migration too.
> 
> difference:
> with meta bitmaps we have double time overhead for writing to the bitmap
> (which is more often operations as I think),
> with second approach we have double overhead for read from the bitmap (but
> for backup, we can *or* these two bitmaps once, and it can be done fast,
> using the power of HBitmap). And of course double ram overhead..

Meta bitmaps seem like a good idea since they are already needed for
live migration.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-12 21:55   ` John Snow
@ 2015-08-26 13:15     ` Vladimir Sementsov-Ogievskiy
  2015-08-26 14:14       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-26 13:15 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den

On 13.06.2015 00:55, John Snow wrote:
>
> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>
>> Adds dirty-bitmaps feature to qcow2 format as specified in
>> docs/specs/qcow2.txt
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
...
>> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
>> +                            const char *name, uint64_t size,
>> +                            int granularity)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int cl_size = s->cluster_size;
>> +    int i, dirty_bitmap_index, ret = 0, n;
>> +    uint64_t *l1_table;
>> +    QCowDirtyBitmap *bm;
>> +    uint64_t buf_size;
>> +    uint8_t *p;
>> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
>> +
>> +    /* find/create dirty bitmap */
>> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
>> +    if (dirty_bitmap_index >= 0) {
>> +        bm = s->dirty_bitmaps + dirty_bitmap_index;
>> +
>> +        if (size != bm->bitmap_size ||
>> +            granularity != bm->bitmap_granularity) {
>> +            qcow2_dirty_bitmap_delete(bs, name, NULL);
> If this fails, we should 'return ret'.
>
>> +            dirty_bitmap_index = -1;
>> +        }
>> +    }
> Oh, find_dirty_bitmap_by_name only looks by name, but then you check to
> make sure the size and granularity matches. If it doesn't, you actually
> create a new bitmap with the *same name* but different attributes, and
> delete the old one.
>
> Is that appropriate? I guess if we're already here in store, it means we
> made it past the add checks... which means for whatever reason we
> definitely want to store *this* bitmap...
>
> I think this code is a little extraneous, it might be best to just issue
> an ultimatum that "You can't have two bitmaps with the same name in a
> file." and let that be that -- finding something with the wrong size
> would just simply be an error.
>
>> +    if (dirty_bitmap_index < 0) {
>> +        qcow2_dirty_bitmap_create(bs, name, size, granularity);
> If this fails, we need to return ret immediately.

Not agree. I think it's ok for qcow2_dirty_bitmap_store to store given 
bitmap if it can. It can in two cases (in next patchset version):

1) found the bitmap with the same name, size and granularity: it is 
assumed to be the previous version and will be rewritten
2) not found the bitmap: it's ok, just save it.. This case works when 
the bitmap was created while qemu runs.



>
>> +        dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
>> +    }
>> +    bm = s->dirty_bitmaps + dirty_bitmap_index;
>> +
>> +    /* read l1 table */
>> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
>> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
>> +                     bm->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto finish;
>> +    }
>> +
>> +    buf_size = (((size - 1) / sector_granularity) >> 3) + 1;
>> +    buf_size = align_offset(buf_size, 4);
>> +    n = buf_size / cl_size;
>> +    p = buf;
>> +    for (i = 0; i < bm->l1_size; ++i) {
>> +        uint64_t addr = be64_to_cpu(l1_table[i]) & ~511;
>> +        int write_size = (i == n ? (buf_size % cl_size) : cl_size);
>> +
>> +        if (buffer_is_zero(p, write_size)) {
>> +            if (addr) {
>> +                qcow2_free_clusters(bs, addr, cl_size,
>> +                                    QCOW2_DISCARD_ALWAYS);
>> +            }
>> +            l1_table[i] = cpu_to_be64(1);
>> +        } else {
>> +            if (!addr) {
>> +                addr = qcow2_alloc_clusters(bs, cl_size);
>> +                l1_table[i] = cpu_to_be64(addr);
>> +            }
>> +
>> +            ret = bdrv_pwrite(bs->file, addr, p, write_size);
>> +            if (ret < 0) {
>> +                goto finish;
>> +            }
>> +        }
>> +
>> +        p += cl_size;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
>> +                      bm->l1_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        goto finish;
>> +    }
>> +
>> +finish:
>> +    g_free(l1_table);
>> +    return ret;
>> +}
>> +/* if no id is provided, a new one is constructed */
>> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-08-26 13:15     ` Vladimir Sementsov-Ogievskiy
@ 2015-08-26 14:14       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-26 14:14 UTC (permalink / raw)
  To: John Snow, qemu-devel
  Cc: kwolf, pbonzini, Vladimir Sementsov-Ogievskiy, stefanha, den

On 26.08.2015 16:15, Vladimir Sementsov-Ogievskiy wrote:
> On 13.06.2015 00:55, John Snow wrote:
>>
>> On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>>
>>> Adds dirty-bitmaps feature to qcow2 format as specified in
>>> docs/specs/qcow2.txt
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
> ...
>>> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
>>> +                            const char *name, uint64_t size,
>>> +                            int granularity)
>>> +{
>>> +    BDRVQcowState *s = bs->opaque;
>>> +    int cl_size = s->cluster_size;
>>> +    int i, dirty_bitmap_index, ret = 0, n;
>>> +    uint64_t *l1_table;
>>> +    QCowDirtyBitmap *bm;
>>> +    uint64_t buf_size;
>>> +    uint8_t *p;
>>> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
>>> +
>>> +    /* find/create dirty bitmap */
>>> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
>>> +    if (dirty_bitmap_index >= 0) {
>>> +        bm = s->dirty_bitmaps + dirty_bitmap_index;
>>> +
>>> +        if (size != bm->bitmap_size ||
>>> +            granularity != bm->bitmap_granularity) {
>>> +            qcow2_dirty_bitmap_delete(bs, name, NULL);
>> If this fails, we should 'return ret'.
>>
>>> +            dirty_bitmap_index = -1;
>>> +        }
>>> +    }
>> Oh, find_dirty_bitmap_by_name only looks by name, but then you check to
>> make sure the size and granularity matches. If it doesn't, you actually
>> create a new bitmap with the *same name* but different attributes, and
>> delete the old one.
>>
>> Is that appropriate? I guess if we're already here in store, it means we
>> made it past the add checks... which means for whatever reason we
>> definitely want to store *this* bitmap...
>>
>> I think this code is a little extraneous, it might be best to just issue
>> an ultimatum that "You can't have two bitmaps with the same name in a
>> file." and let that be that -- finding something with the wrong size
>> would just simply be an error.
>>
>>> +    if (dirty_bitmap_index < 0) {
>>> +        qcow2_dirty_bitmap_create(bs, name, size, granularity);
>> If this fails, we need to return ret immediately.
>
> Not agree. I think it's ok for qcow2_dirty_bitmap_store to store given 
> bitmap if it can. It can in two cases (in next patchset version):
>
> 1) found the bitmap with the same name, size and granularity: it is 
> assumed to be the previous version and will be rewritten
> 2) not found the bitmap: it's ok, just save it.. This case works when 
> the bitmap was created while qemu runs.
>

Oh, sorry. You mean,
ret = qcow2_dirty_bitmap_create ...
if (ret < 0) {
return ret;
}

, ok

>
>
>>
>>> +        dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
>>> +    }
>>> +    bm = s->dirty_bitmaps + dirty_bitmap_index;
>>> +
>>> +    /* read l1 table */
>>> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
>>> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
>>> +                     bm->l1_size * sizeof(uint64_t));
>>> +    if (ret < 0) {
>>> +        goto finish;
>>> +    }
>>> +
>>> +    buf_size = (((size - 1) / sector_granularity) >> 3) + 1;
>>> +    buf_size = align_offset(buf_size, 4);
>>> +    n = buf_size / cl_size;
>>> +    p = buf;
>>> +    for (i = 0; i < bm->l1_size; ++i) {
>>> +        uint64_t addr = be64_to_cpu(l1_table[i]) & ~511;
>>> +        int write_size = (i == n ? (buf_size % cl_size) : cl_size);
>>> +
>>> +        if (buffer_is_zero(p, write_size)) {
>>> +            if (addr) {
>>> +                qcow2_free_clusters(bs, addr, cl_size,
>>> +                                    QCOW2_DISCARD_ALWAYS);
>>> +            }
>>> +            l1_table[i] = cpu_to_be64(1);
>>> +        } else {
>>> +            if (!addr) {
>>> +                addr = qcow2_alloc_clusters(bs, cl_size);
>>> +                l1_table[i] = cpu_to_be64(addr);
>>> +            }
>>> +
>>> +            ret = bdrv_pwrite(bs->file, addr, p, write_size);
>>> +            if (ret < 0) {
>>> +                goto finish;
>>> +            }
>>> +        }
>>> +
>>> +        p += cl_size;
>>> +    }
>>> +
>>> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
>>> +                      bm->l1_size * sizeof(uint64_t));
>>> +    if (ret < 0) {
>>> +        goto finish;
>>> +    }
>>> +
>>> +finish:
>>> +    g_free(l1_table);
>>> +    return ret;
>>> +}
>>> +/* if no id is provided, a new one is constructed */
>>> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
>


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-06-09 15:50   ` Stefan Hajnoczi
@ 2015-08-27  7:45     ` Vladimir Sementsov-Ogievskiy
  2015-08-31 11:06       ` Vladimir Sementsov-Ogievskiy
  2015-08-31 22:39       ` Eric Blake
  0 siblings, 2 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-27  7:45 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

On 09.06.2015 18:50, Stefan Hajnoczi wrote:
> On Mon, Jun 08, 2015 at 06:21:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 406e55d..f85a55a 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -182,6 +182,14 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>                   return ret;
>>               }
>>   
>> +            if (!(s->autoclear_features & QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
>> +                s->nb_dirty_bitmaps > 0) {
>> +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
>> +                if (ret < 0) {
>> +                    return ret;
>> +                }
>> +            }
>> +
> What if the file is read-only?
>
> We shouldn't modify the file in qcow2_read_extensions().
But where? In qcow2_open? Or nowhere? I think auto clear extensions 
should be cleared automatically..

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
  2015-06-24  0:21     ` John Snow
  2015-07-08 12:24       ` Vladimir Sementsov-Ogievskiy
@ 2015-08-27 10:08       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-27 10:08 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, pbonzini, Fam Zheng, stefanha, den

On 24.06.2015 03:21, John Snow wrote:
>
> On 06/17/2015 10:29 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 12.06.2015 22:34, John Snow wrote:
>>>
...
>>>
>>> (9) Data consistency
>>>
>>> We need to discuss the data safety element to this. I think that
>>> atomically before the first write is flushed to disk, the dirty bitmap
>>> needs to *at least* set a bit in the bitmap header that indicates that
>>> the bitmap is no longer up-to-date.
>>>
>>> When the bitmap is later flushed to disk, that bit can be cleared until
>>> the next write occurs, which repeats the process.

Not the next write, but next change in the bitmap. Write possibly may 
not change the bitmap (if corresponding bit is already dirty). This is 
the key thing, which can seriously extent life of in_use=0.

>>>
>>> We have discussed this (long ago) in the past, but one of the ideas was
>>> to monitor the relative utilization rate of the disk and attempt to
>>> flush the bitmap whenever there was a lull in disk IO, then clear the
>>> "inconsistent" bit.
>>>
>>> On close, the flush of data and bitmap both would lead us to clear this
>>> bit as well.
>>>
>>> Upon boot, if the inconsistent bit was set, we'd know that the bitmap
>>> was outdated and we'd have to recommend that the bitmap be cleared and a
>>> new bitmap started.
>>>
>>> (Or, perhaps, a data-intensive mode where we compare the current data
>>> mode with the most recent incremental backup to re-determine what data
>>> has changed. This would be very, very slow but an option at least for
>>> recovery if started a new full backup is even less desirable.)
>>>
>>> Other ideas involve regularly flushing the bitmap at certain timed
>>> intervals, certain usage intervals (e.g. when the changed bitmap data
>>> reaches some total size, like 64KiB of changed bits), or a combination
>>> of regular intervals with "opportunistic" flushing during Disk IO lulls.
>>>
>>> This is a key feature that absolutely needs to make it into the base
>>> series, IMO.
>> I don't understand, what the use of flushing bitmap not only on
>> disk:close? If there no failures with disk, than bitmap will be flushed
>> on close and will be consistent for next open(). If there is a disk
>> crash, even if we flush the bitmap regularly, what is the possibility of
>> crashing immediately after last flush, before further io-s?
>>
> The usage case is QEMU crash, power failure, etc. Not disk crash. If we
> periodically flush to HD, we increase the chances that we don't corrupt
> our image and bitmap.
>
> If we NEVER flush, we guarantee that any segfault or power outage will
> absolutely trash our data.

Also, I have the following idea:

Disk is written often.
Bitmap is updated more seldom.
HBitmap previous level is updated even more seldom..

To not store all bitmap levels in file, just save in the image file the 
number of largest consistent level:


flush bitmap: consistent_level = HBITMAP_MAX_LEVEL

change bitmap level X: if consistent_level > X then consistent_level = X 
- 1 (and flush consistent_level to file)

Then, after fail, we can restore the bitmap from last consistent level:

gran = 1 << (level_bits * (HBITMAP_MAX_LEVEL - consistent_level))
bitmap[i] = bitmap[i - i % gran] OR bitmap[i - i % gran + 1] OR ... OR 
bitmap[i - i % gran + (gran - 1)]


to make this scheme independent of HBitmap, it may be better to number 
levels from 0 (0 is largest level), and save level_bits to Image file too.


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature Vladimir Sementsov-Ogievskiy
                     ` (3 preceding siblings ...)
  2015-06-12 21:55   ` John Snow
@ 2015-08-27 12:43   ` Vladimir Sementsov-Ogievskiy
  4 siblings, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-27 12:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

I've made this similar to snapshots, but now I think...

Why should we maintain dirty bitmap directory in ram? This only gives us 
saving of one extra bdrv_read of header on loading bitmap, but is 
doesn't matter in comparison with reading the whole bitmap to memory. 
Also, the bitmap should not be loaded more then once.

Maintaining dirty bitmap directory in ram also saves extra read and 
conversion BE->cpu on updating header (for changing in_use). But I'm not 
sure that this is serious, because write and conversion cpu->BE should 
be done anyway.

Drawback of current approach is obvious: extra code, extra structs, 
extra conversions.

One read of the whole table may be will be needed for loading 
auto-loading bitmaps, but then this table can be freed from ram.

To have fast access to bitmap headers (for changing in_use field, for 
ex.), may be it will be good to maintain in ram mapping from bitmap name 
to offset in the image. List of pairs "struct {const char *name, 
uint64_t offset}" for example..

On 08.06.2015 18:21, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>
> Adds dirty-bitmaps feature to qcow2 format as specified in
> docs/specs/qcow2.txt
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   block/Makefile.objs        |   2 +-
>   block/qcow2-dirty-bitmap.c | 503 +++++++++++++++++++++++++++++++++++++++++++++
>   block/qcow2.c              |  56 +++++
>   block/qcow2.h              |  50 +++++
>   include/block/block_int.h  |  10 +
>   5 files changed, 620 insertions(+), 1 deletion(-)
>   create mode 100644 block/qcow2-dirty-bitmap.c
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 0d8c2a4..bff12b4 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,5 +1,5 @@
>   block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o qcow2-dirty-bitmap.o
>   block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>   block-obj-y += qed-check.o
>   block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
> new file mode 100644
> index 0000000..bc0167c
> --- /dev/null
> +++ b/block/qcow2-dirty-bitmap.c
> @@ -0,0 +1,503 @@
> +/*
> + * Dirty bitmpas for the QCOW version 2 format
> + *
> + * Copyright (c) 2014-2015 Vladimir Sementsov-Ogievskiy
> + *
> + * This file is derived from qcow2-snapshot.c, original copyright:
> + * Copyright (c) 2004-2006 Fabrice Bellard
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "block/qcow2.h"
> +
> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        g_free(s->dirty_bitmaps[i].name);
> +    }
> +    g_free(s->dirty_bitmaps);
> +    s->dirty_bitmaps = NULL;
> +    s->nb_dirty_bitmaps = 0;
> +}
> +
> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmapHeader h;
> +    QCowDirtyBitmap *bm;
> +    int i, name_size;
> +    int64_t offset;
> +    int ret;
> +
> +    if (!s->nb_dirty_bitmaps) {
> +        s->dirty_bitmaps = NULL;
> +        s->dirty_bitmaps_size = 0;
> +        return 0;
> +    }
> +
> +    offset = s->dirty_bitmaps_offset;
> +    s->dirty_bitmaps = g_new0(QCowDirtyBitmap, s->nb_dirty_bitmaps);
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        /* Read statically sized part of the dirty_bitmap header */
> +        offset = align_offset(offset, 8);
> +        ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +
> +        offset += sizeof(h);
> +        bm = s->dirty_bitmaps + i;
> +        bm->l1_table_offset = be64_to_cpu(h.l1_table_offset);
> +        bm->l1_size = be32_to_cpu(h.l1_size);
> +        bm->bitmap_granularity = be32_to_cpu(h.bitmap_granularity);
> +        bm->bitmap_size = be64_to_cpu(h.bitmap_size);
> +
> +        name_size = be16_to_cpu(h.name_size);
> +
> +        /* Read dirty_bitmap name */
> +        bm->name = g_malloc(name_size + 1);
> +        ret = bdrv_pread(bs->file, offset, bm->name, name_size);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += name_size;
> +        bm->name[name_size] = '\0';
> +
> +        if (offset - s->dirty_bitmaps_offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
> +            ret = -EFBIG;
> +            goto fail;
> +        }
> +    }
> +
> +    assert(offset - s->dirty_bitmaps_offset <= INT_MAX);
> +    s->dirty_bitmaps_size = offset - s->dirty_bitmaps_offset;
> +    return 0;
> +
> +fail:
> +    qcow2_free_dirty_bitmaps(bs);
> +    return ret;
> +}
> +
> +/* Add at the end of the file a new table of dirty bitmaps */
> +static int qcow2_write_dirty_bitmaps(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap *bm;
> +    QCowDirtyBitmapHeader h;
> +    int i, name_size, dirty_bitmaps_size;
> +    int64_t offset, dirty_bitmaps_offset = 0;
> +    int ret;
> +
> +    int old_dirty_bitmaps_size = s->dirty_bitmaps_size;
> +    int64_t old_dirty_bitmaps_offset = s->dirty_bitmaps_offset;
> +
> +    /* Compute the size of the dirty bitmaps table */
> +    offset = 0;
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        bm = s->dirty_bitmaps + i;
> +        offset = align_offset(offset, 8);
> +        offset += sizeof(h);
> +        offset += strlen(bm->name);
> +
> +        if (offset > QCOW_MAX_DIRTY_BITMAPS_SIZE) {
> +            ret = -EFBIG;
> +            goto fail;
> +        }
> +    }
> +
> +    assert(offset <= INT_MAX);
> +    dirty_bitmaps_size = offset;
> +
> +    /* Allocate space for the new dirty bitmap table */
> +    dirty_bitmaps_offset = qcow2_alloc_clusters(bs, dirty_bitmaps_size);
> +    offset = dirty_bitmaps_offset;
> +    if (offset < 0) {
> +        ret = offset;
> +        goto fail;
> +    }
> +    ret = bdrv_flush(bs);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    /* The dirty bitmap table position has not yet been updated, so these
> +     * clusters must indeed be completely free */
> +    ret = qcow2_pre_write_overlap_check(bs, 0, offset, dirty_bitmaps_size);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    /* Write all dirty bitmaps to the new table */
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        bm = s->dirty_bitmaps + i;
> +        memset(&h, 0, sizeof(h));
> +        h.l1_table_offset = cpu_to_be64(bm->l1_table_offset);
> +        h.l1_size = cpu_to_be32(bm->l1_size);
> +        h.bitmap_granularity = cpu_to_be32(bm->bitmap_granularity);
> +        h.bitmap_size = cpu_to_be64(bm->bitmap_size);
> +
> +        name_size = strlen(bm->name);
> +        assert(name_size <= UINT16_MAX);
> +        h.name_size = cpu_to_be16(name_size);
> +        offset = align_offset(offset, 8);
> +
> +        ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += sizeof(h);
> +
> +        ret = bdrv_pwrite(bs->file, offset, bm->name, name_size);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += name_size;
> +    }
> +
> +    /*
> +     * Update the header extension to point to the new dirty bitmap table. This
> +     * requires the new table and its refcounts to be stable on disk.
> +     */
> +    ret = bdrv_flush(bs);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    s->dirty_bitmaps_offset = dirty_bitmaps_offset;
> +    s->dirty_bitmaps_size = dirty_bitmaps_size;
> +    ret = qcow2_update_header(bs);
> +    if (ret < 0) {
> +        fprintf(stderr, "Could not update qcow2 header\n");
> +        goto fail;
> +    }
> +
> +    /* Free old dirty bitmap table */
> +    qcow2_free_clusters(bs, old_dirty_bitmaps_offset, old_dirty_bitmaps_size,
> +                        QCOW2_DISCARD_ALWAYS);
> +    return 0;
> +
> +fail:
> +    if (dirty_bitmaps_offset > 0) {
> +        qcow2_free_clusters(bs, dirty_bitmaps_offset, dirty_bitmaps_size,
> +                            QCOW2_DISCARD_ALWAYS);
> +    }
> +    return ret;
> +}
> +
> +static int find_dirty_bitmap_by_name(BlockDriverState *bs,
> +                                     const char *name)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->nb_dirty_bitmaps; i++) {
> +        if (!strcmp(s->dirty_bitmaps[i].name, name)) {
> +            return i;
> +        }
> +    }
> +
> +    return -1;
> +}
> +
> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
> +                            const char *name, uint64_t size,
> +                            int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int i, dirty_bitmap_index, ret;
> +    uint64_t offset;
> +    QCowDirtyBitmap *bm;
> +    uint64_t *l1_table;
> +    uint8_t *buf;
> +
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index < 0) {
> +        return NULL;
> +    }
> +    bm = &s->dirty_bitmaps[dirty_bitmap_index];
> +
> +    if (size != bm->bitmap_size || granularity != bm->bitmap_granularity) {
> +        return NULL;
> +    }
> +
> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    buf = g_malloc0(bm->l1_size * s->cluster_size);
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        offset = be64_to_cpu(l1_table[i]);
> +        if (!(offset & 1)) {
> +            ret = bdrv_pread(bs->file, offset, buf + i * s->cluster_size,
> +                             s->cluster_size);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +    }
> +
> +    g_free(l1_table);
> +    return buf;
> +
> +fail:
> +    g_free(l1_table);
> +    return NULL;
> +}
> +
> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
> +                            const char *name, uint64_t size,
> +                            int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int cl_size = s->cluster_size;
> +    int i, dirty_bitmap_index, ret = 0, n;
> +    uint64_t *l1_table;
> +    QCowDirtyBitmap *bm;
> +    uint64_t buf_size;
> +    uint8_t *p;
> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
> +
> +    /* find/create dirty bitmap */
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index >= 0) {
> +        bm = s->dirty_bitmaps + dirty_bitmap_index;
> +
> +        if (size != bm->bitmap_size ||
> +            granularity != bm->bitmap_granularity) {
> +            qcow2_dirty_bitmap_delete(bs, name, NULL);
> +            dirty_bitmap_index = -1;
> +        }
> +    }
> +    if (dirty_bitmap_index < 0) {
> +        qcow2_dirty_bitmap_create(bs, name, size, granularity);
> +        dirty_bitmap_index = s->nb_dirty_bitmaps - 1;
> +    }
> +    bm = s->dirty_bitmaps + dirty_bitmap_index;
> +
> +    /* read l1 table */
> +    l1_table = g_malloc(bm->l1_size * sizeof(uint64_t));
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto finish;
> +    }
> +
> +    buf_size = (((size - 1) / sector_granularity) >> 3) + 1;
> +    buf_size = align_offset(buf_size, 4);
> +    n = buf_size / cl_size;
> +    p = buf;
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        uint64_t addr = be64_to_cpu(l1_table[i]) & ~511;
> +        int write_size = (i == n ? (buf_size % cl_size) : cl_size);
> +
> +        if (buffer_is_zero(p, write_size)) {
> +            if (addr) {
> +                qcow2_free_clusters(bs, addr, cl_size,
> +                                    QCOW2_DISCARD_ALWAYS);
> +            }
> +            l1_table[i] = cpu_to_be64(1);
> +        } else {
> +            if (!addr) {
> +                addr = qcow2_alloc_clusters(bs, cl_size);
> +                l1_table[i] = cpu_to_be64(addr);
> +            }
> +
> +            ret = bdrv_pwrite(bs->file, addr, p, write_size);
> +            if (ret < 0) {
> +                goto finish;
> +            }
> +        }
> +
> +        p += cl_size;
> +    }
> +
> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
> +                      bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto finish;
> +    }
> +
> +finish:
> +    g_free(l1_table);
> +    return ret;
> +}
> +/* if no id is provided, a new one is constructed */
> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
> +                              uint64_t size, int granularity)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap *new_dirty_bitmap_list = NULL;
> +    QCowDirtyBitmap *old_dirty_bitmap_list = NULL;
> +    QCowDirtyBitmap sn1, *bm = &sn1;
> +    int i, ret;
> +    uint64_t *l1_table = NULL;
> +    int64_t l1_table_offset;
> +    int sector_granularity = granularity >> BDRV_SECTOR_BITS;
> +
> +    if (s->nb_dirty_bitmaps >= QCOW_MAX_DIRTY_BITMAPS) {
> +        return -EFBIG;
> +    }
> +
> +    memset(bm, 0, sizeof(*bm));
> +
> +    /* Check that the ID is unique */
> +    if (find_dirty_bitmap_by_name(bs, name) >= 0) {
> +        return -EEXIST;
> +    }
> +
> +    /* Populate bm with passed data */
> +    bm->name = g_strdup(name);
> +    bm->bitmap_granularity = granularity;
> +    bm->bitmap_size = size;
> +
> +    bm->l1_size =
> +        size_to_clusters(s, (((size - 1) / sector_granularity) >> 3) + 1);
> +    l1_table_offset =
> +        qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
> +    if (l1_table_offset < 0) {
> +        ret = l1_table_offset;
> +        goto fail;
> +    }
> +    bm->l1_table_offset = l1_table_offset;
> +
> +    l1_table = g_try_new(uint64_t, bm->l1_size);
> +    if (l1_table == NULL) {
> +        ret = -ENOMEM;
> +        goto fail;
> +    }
> +
> +    /* initialize with zero clusters */
> +    for (i = 0; i < s->l1_size; i++) {
> +        l1_table[i] = cpu_to_be64(1);
> +    }
> +
> +    ret = qcow2_pre_write_overlap_check(bs, 0, bm->l1_table_offset,
> +                                        s->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    ret = bdrv_pwrite(bs->file, bm->l1_table_offset, l1_table,
> +                      s->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    g_free(l1_table);
> +    l1_table = NULL;
> +
> +    /* Append the new dirty bitmap to the dirty bitmap list */
> +    new_dirty_bitmap_list = g_new(QCowDirtyBitmap, s->nb_dirty_bitmaps + 1);
> +    if (s->dirty_bitmaps) {
> +        memcpy(new_dirty_bitmap_list, s->dirty_bitmaps,
> +               s->nb_dirty_bitmaps * sizeof(QCowDirtyBitmap));
> +        old_dirty_bitmap_list = s->dirty_bitmaps;
> +    }
> +    s->dirty_bitmaps = new_dirty_bitmap_list;
> +    s->dirty_bitmaps[s->nb_dirty_bitmaps++] = *bm;
> +
> +    ret = qcow2_write_dirty_bitmaps(bs);
> +    if (ret < 0) {
> +        g_free(s->dirty_bitmaps);
> +        s->dirty_bitmaps = old_dirty_bitmap_list;
> +        s->nb_dirty_bitmaps--;
> +        goto fail;
> +    }
> +
> +    g_free(old_dirty_bitmap_list);
> +
> +    return 0;
> +
> +fail:
> +    g_free(bm->name);
> +    g_free(l1_table);
> +
> +    return ret;
> +}
> +
> +static int qcow2_dirty_bitmap_free_clusters(BlockDriverState *bs,
> +                                            QCowDirtyBitmap *bm)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int ret, i;
> +    uint64_t *l1_table = g_new(uint64_t, bm->l1_size);
> +
> +    ret = bdrv_pread(bs->file, bm->l1_table_offset, l1_table,
> +                     bm->l1_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        g_free(l1_table);
> +        return ret;
> +    }
> +
> +    for (i = 0; i < bm->l1_size; ++i) {
> +        uint64_t addr = be64_to_cpu(l1_table[i]);
> +        qcow2_free_clusters(bs, addr, s->cluster_size, QCOW2_DISCARD_ALWAYS);
> +    }
> +
> +    qcow2_free_clusters(bs, bm->l1_table_offset, bm->l1_size * sizeof(uint64_t),
> +                        QCOW2_DISCARD_ALWAYS);
> +
> +    g_free(l1_table);
> +    return 0;
> +}
> +
> +int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
> +                              const char *name,
> +                              Error **errp)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    QCowDirtyBitmap bm;
> +    int dirty_bitmap_index, ret = 0;
> +
> +    /* Search the dirty_bitmap */
> +    dirty_bitmap_index = find_dirty_bitmap_by_name(bs, name);
> +    if (dirty_bitmap_index < 0) {
> +        error_setg(errp, "Can't find the dirty bitmap");
> +        return -ENOENT;
> +    }
> +    bm = s->dirty_bitmaps[dirty_bitmap_index];
> +
> +    /* Remove it from the dirty_bitmap list */
> +    memmove(s->dirty_bitmaps + dirty_bitmap_index,
> +            s->dirty_bitmaps + dirty_bitmap_index + 1,
> +            (s->nb_dirty_bitmaps - dirty_bitmap_index - 1) * sizeof(bm));
> +    s->nb_dirty_bitmaps--;
> +    ret = qcow2_write_dirty_bitmaps(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret,
> +                         "Failed to remove dirty bitmap"
> +                         " from dirty bitmap list");
> +        return ret;
> +    }
> +
> +    qcow2_dirty_bitmap_free_clusters(bs, &bm);
> +    g_free(bm.name);
> +
> +    return ret;
> +}
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b9a72e3..406e55d 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -61,6 +61,7 @@ typedef struct {
>   #define  QCOW2_EXT_MAGIC_END 0
>   #define  QCOW2_EXT_MAGIC_BACKING_FORMAT 0xE2792ACA
>   #define  QCOW2_EXT_MAGIC_FEATURE_TABLE 0x6803f857
> +#define  QCOW2_EXT_MAGIC_DIRTY_BITMAPS 0x23852875
>   
>   static int qcow2_probe(const uint8_t *buf, int buf_size, const char *filename)
>   {
> @@ -90,6 +91,7 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>       QCowExtension ext;
>       uint64_t offset;
>       int ret;
> +    Qcow2DirtyBitmapHeaderExt dirty_bitmaps_ext;
>   
>   #ifdef DEBUG_EXT
>       printf("qcow2_read_extensions: start=%ld end=%ld\n", start_offset, end_offset);
> @@ -160,6 +162,33 @@ static int qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>               }
>               break;
>   
> +        case QCOW2_EXT_MAGIC_DIRTY_BITMAPS:
> +            ret = bdrv_pread(bs->file, offset, &dirty_bitmaps_ext, ext.len);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "ERROR: dirty_bitmaps_ext: "
> +                                 "Could not read ext header");
> +                return ret;
> +            }
> +
> +            be64_to_cpus(&dirty_bitmaps_ext.dirty_bitmaps_offset);
> +            be32_to_cpus(&dirty_bitmaps_ext.nb_dirty_bitmaps);
> +
> +            s->dirty_bitmaps_offset = dirty_bitmaps_ext.dirty_bitmaps_offset;
> +            s->nb_dirty_bitmaps = dirty_bitmaps_ext.nb_dirty_bitmaps;
> +
> +            ret = qcow2_read_dirty_bitmaps(bs);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret, "Could not read dirty bitmaps");
> +                return ret;
> +            }
> +
> +#ifdef DEBUG_EXT
> +            printf("Qcow2: Got dirty bitmaps extension:"
> +                   " offset=%" PRIu64 " nb_bitmaps=%" PRIu32 "\n",
> +                   s->dirty_bitmaps_offset, s->nb_dirty_bitmaps);
> +#endif
> +            break;
> +
>           default:
>               /* unknown magic - save it in case we need to rewrite the header */
>               {
> @@ -1000,6 +1029,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       g_free(s->unknown_header_fields);
>       cleanup_unknown_header_ext(bs);
>       qcow2_free_snapshots(bs);
> +    qcow2_free_dirty_bitmaps(bs);
>       qcow2_refcount_close(bs);
>       qemu_vfree(s->l1_table);
>       /* else pre-write overlap checks in cache_destroy may crash */
> @@ -1466,6 +1496,7 @@ static void qcow2_close(BlockDriverState *bs)
>       qemu_vfree(s->cluster_data);
>       qcow2_refcount_close(bs);
>       qcow2_free_snapshots(bs);
> +    qcow2_free_dirty_bitmaps(bs);
>   }
>   
>   static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
> @@ -1667,6 +1698,21 @@ int qcow2_update_header(BlockDriverState *bs)
>       buf += ret;
>       buflen -= ret;
>   
> +    if (s->nb_dirty_bitmaps > 0) {
> +        Qcow2DirtyBitmapHeaderExt dirty_bitmaps_header = {
> +            .nb_dirty_bitmaps = cpu_to_be32(s->nb_dirty_bitmaps),
> +            .dirty_bitmaps_offset = cpu_to_be64(s->dirty_bitmaps_offset)
> +        };
> +        ret = header_ext_add(buf, QCOW2_EXT_MAGIC_DIRTY_BITMAPS,
> +                             &dirty_bitmaps_header, sizeof(dirty_bitmaps_header),
> +                             buflen);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        buf += ret;
> +        buflen -= ret;
> +    }
> +
>       /* Keep unknown header extensions */
>       QLIST_FOREACH(uext, &s->unknown_header_ext, next) {
>           ret = header_ext_add(buf, uext->magic, uext->data, uext->len, buflen);
> @@ -2176,6 +2222,12 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
>           return -ENOTSUP;
>       }
>   
> +    /* cannot proceed if image has dirty_bitmaps */
> +    if (s->nb_dirty_bitmaps) {
> +        error_report("Can't resize an image which has dirty bitmaps");
> +        return -ENOTSUP;
> +    }
> +
>       /* shrinking is currently not supported */
>       if (offset < bs->total_sectors * 512) {
>           error_report("qcow2 doesn't support shrinking images yet");
> @@ -2952,6 +3004,10 @@ BlockDriver bdrv_qcow2 = {
>       .bdrv_get_info          = qcow2_get_info,
>       .bdrv_get_specific_info = qcow2_get_specific_info,
>   
> +    .bdrv_dirty_bitmap_load = qcow2_dirty_bitmap_load,
> +    .bdrv_dirty_bitmap_store = qcow2_dirty_bitmap_store,
> +    .bdrv_dirty_bitmap_delete = qcow2_dirty_bitmap_delete,
> +
>       .bdrv_save_vmstate    = qcow2_save_vmstate,
>       .bdrv_load_vmstate    = qcow2_load_vmstate,
>   
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 422b825..24beee0 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -39,6 +39,7 @@
>   
>   #define QCOW_MAX_CRYPT_CLUSTERS 32
>   #define QCOW_MAX_SNAPSHOTS 65536
> +#define QCOW_MAX_DIRTY_BITMAPS 65536
>   
>   /* 8 MB refcount table is enough for 2 PB images at 64k cluster size
>    * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
> @@ -52,6 +53,8 @@
>    * space for snapshot names and IDs */
>   #define QCOW_MAX_SNAPSHOTS_SIZE (1024 * QCOW_MAX_SNAPSHOTS)
>   
> +#define QCOW_MAX_DIRTY_BITMAPS_SIZE (1024 * QCOW_MAX_DIRTY_BITMAPS)
> +
>   /* indicate that the refcount of the referenced cluster is exactly one. */
>   #define QCOW_OFLAG_COPIED     (1ULL << 63)
>   /* indicate that the cluster is compressed (they never have the copied flag) */
> @@ -138,6 +141,19 @@ typedef struct QEMU_PACKED QCowSnapshotHeader {
>       /* name follows  */
>   } QCowSnapshotHeader;
>   
> +typedef struct QEMU_PACKED QCowDirtyBitmapHeader {
> +    /* header is 8 byte aligned */
> +    uint64_t l1_table_offset;
> +
> +    uint32_t l1_size;
> +    uint32_t bitmap_granularity;
> +
> +    uint64_t bitmap_size;
> +    uint16_t name_size;
> +
> +    /* name follows  */
> +} QCowDirtyBitmapHeader;
> +
>   typedef struct QEMU_PACKED QCowSnapshotExtraData {
>       uint64_t vm_state_size_large;
>       uint64_t disk_size;
> @@ -156,6 +172,14 @@ typedef struct QCowSnapshot {
>       uint64_t vm_clock_nsec;
>   } QCowSnapshot;
>   
> +typedef struct QCowDirtyBitmap {
> +    uint64_t l1_table_offset;
> +    uint32_t l1_size;
> +    char *name;
> +    int bitmap_granularity;
> +    uint64_t bitmap_size;
> +} QCowDirtyBitmap;
> +
>   struct Qcow2Cache;
>   typedef struct Qcow2Cache Qcow2Cache;
>   
> @@ -218,6 +242,11 @@ typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>   typedef void Qcow2SetRefcountFunc(void *refcount_array,
>                                     uint64_t index, uint64_t value);
>   
> +typedef struct Qcow2DirtyBitmapHeaderExt {
> +    uint32_t nb_dirty_bitmaps;
> +    uint64_t dirty_bitmaps_offset;
> +} QEMU_PACKED Qcow2DirtyBitmapHeaderExt;
> +
>   typedef struct BDRVQcowState {
>       int cluster_bits;
>       int cluster_size;
> @@ -259,6 +288,11 @@ typedef struct BDRVQcowState {
>       unsigned int nb_snapshots;
>       QCowSnapshot *snapshots;
>   
> +    uint64_t dirty_bitmaps_offset;
> +    int dirty_bitmaps_size;
> +    unsigned int nb_dirty_bitmaps;
> +    QCowDirtyBitmap *dirty_bitmaps;
> +
>       int flags;
>       int qcow_version;
>       bool use_lazy_refcounts;
> @@ -570,6 +604,22 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs,
>   void qcow2_free_snapshots(BlockDriverState *bs);
>   int qcow2_read_snapshots(BlockDriverState *bs);
>   
> +/* qcow2-dirty-bitmap.c functions */
> +int qcow2_dirty_bitmap_store(BlockDriverState *bs, uint8_t *buf,
> +                             const char *name, uint64_t size,
> +                             int granularity);
> +uint8_t *qcow2_dirty_bitmap_load(BlockDriverState *bs,
> +                                 const char *name, uint64_t size,
> +                                 int granularity);
> +int qcow2_dirty_bitmap_create(BlockDriverState *bs, const char *name,
> +                              uint64_t size, int granularity);
> +int qcow2_dirty_bitmap_delete(BlockDriverState *bs,
> +                              const char *name,
> +                              Error **errp);
> +
> +void qcow2_free_dirty_bitmaps(BlockDriverState *bs);
> +int qcow2_read_dirty_bitmaps(BlockDriverState *bs);
> +
>   /* qcow2-cache.c functions */
>   Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
>   int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index db29b74..88855b4 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -206,6 +206,16 @@ struct BlockDriver {
>       int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
>       ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
>   
> +    int (*bdrv_dirty_bitmap_store)(BlockDriverState *bs, uint8_t *buf,
> +                                   const char *name, uint64_t size,
> +                                   int granularity);
> +    uint8_t *(*bdrv_dirty_bitmap_load)(BlockDriverState *bs,
> +                                       const char *name, uint64_t size,
> +                                       int granularity);
> +    int (*bdrv_dirty_bitmap_delete)(BlockDriverState *bs,
> +                                    const char *name,
> +                                    Error **errp);
> +
>       int (*bdrv_save_vmstate)(BlockDriverState *bs, QEMUIOVector *qiov,
>                                int64_t pos);
>       int (*bdrv_load_vmstate)(BlockDriverState *bs, uint8_t *buf,


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-08-27  7:45     ` Vladimir Sementsov-Ogievskiy
@ 2015-08-31 11:06       ` Vladimir Sementsov-Ogievskiy
  2015-08-31 22:39       ` Eric Blake
  1 sibling, 0 replies; 76+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-08-31 11:06 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

On 27.08.2015 10:45, Vladimir Sementsov-Ogievskiy wrote:
> On 09.06.2015 18:50, Stefan Hajnoczi wrote:
>> On Mon, Jun 08, 2015 at 06:21:24PM +0300, Vladimir 
>> Sementsov-Ogievskiy wrote:
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 406e55d..f85a55a 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -182,6 +182,14 @@ static int 
>>> qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>>                   return ret;
>>>               }
>>>   +            if (!(s->autoclear_features & 
>>> QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
>>> +                s->nb_dirty_bitmaps > 0) {
>>> +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
>>> +                if (ret < 0) {
>>> +                    return ret;
>>> +                }
>>> +            }
>>> +
>> What if the file is read-only?
>>
>> We shouldn't modify the file in qcow2_read_extensions().
> But where? In qcow2_open? Or nowhere? I think auto clear extensions 
> should be cleared automatically..
>

May be, move clearing to qemu-img, and just warn here?

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
                     ` (4 preceding siblings ...)
  2015-08-24 14:04   ` Vladimir Sementsov-Ogievskiy
@ 2015-08-31 22:21   ` Eric Blake
  2015-08-31 22:24     ` John Snow
  5 siblings, 1 reply; 76+ messages in thread
From: Eric Blake @ 2015-08-31 22:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini, den,
	jsnow

[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]

On 06/08/2015 09:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
> 
> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
> other drives (there may be qcow2 file with zero disk size but with
> several dirty bitmaps for other drives).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 

> +== Dirty bitmaps ==
> +
> +Dirty bitmaps is an optional header extension. It provides a possibility of
> +storing dirty bitmaps in qcow2 image. The fields are:
> +
> +          0 -  3:  nb_dirty_bitmaps
> +                   Number of dirty bitmaps contained in the image
> +
> +          4 - 11:  dirty_bitmaps_offset
> +                   Offset into the image file at which the dirty bitmaps table
> +                   starts. Must be aligned to a cluster boundary.

To date, all 8-byte fields in qcow2 have been 8-byte aligned; this would
break that nice feature.  You could keep that nice property by swapping
the order of these two fields.

[Note that the spec on header extensions already requires clients to
recognize that a header extension of 12 bytes implicitly pads out an
additional 4 bytes, so that the next header extension type field once
again lands on 8-byte alignment]

> +== Dirty bitmaps ==
> +
> +The feature supports storing several dirty bitmaps in the qcow2 file.

Is it possible to have a qcow2 file that stores JUST dirty bitmap(s) and
no guest data (that is, no L1 table, no backing file)?  It might make
sense, if we intend to allow persistent bitmap files that can be
associated with a raw disk.  But right now, the spec seems to require
that l1_table_offset must be non-zero.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification
  2015-08-31 22:21   ` Eric Blake
@ 2015-08-31 22:24     ` John Snow
  0 siblings, 0 replies; 76+ messages in thread
From: John Snow @ 2015-08-31 22:24 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: kwolf, den, Vladimir Sementsov-Ogievskiy, stefanha, pbonzini



On 08/31/2015 06:21 PM, Eric Blake wrote:
> On 06/08/2015 09:21 AM, Vladimir Sementsov-Ogievskiy wrote:
>> From: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
>>
>> Persistent dirty bitmaps will be saved into qcow2 files. It may be used
>> as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for
>> other drives (there may be qcow2 file with zero disk size but with
>> several dirty bitmaps for other drives).
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  docs/specs/qcow2.txt | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 66 insertions(+)
>>
> 
>> +== Dirty bitmaps ==
>> +
>> +Dirty bitmaps is an optional header extension. It provides a possibility of
>> +storing dirty bitmaps in qcow2 image. The fields are:
>> +
>> +          0 -  3:  nb_dirty_bitmaps
>> +                   Number of dirty bitmaps contained in the image
>> +
>> +          4 - 11:  dirty_bitmaps_offset
>> +                   Offset into the image file at which the dirty bitmaps table
>> +                   starts. Must be aligned to a cluster boundary.
> 
> To date, all 8-byte fields in qcow2 have been 8-byte aligned; this would
> break that nice feature.  You could keep that nice property by swapping
> the order of these two fields.
> 
> [Note that the spec on header extensions already requires clients to
> recognize that a header extension of 12 bytes implicitly pads out an
> additional 4 bytes, so that the next header extension type field once
> again lands on 8-byte alignment]
> 
>> +== Dirty bitmaps ==
>> +
>> +The feature supports storing several dirty bitmaps in the qcow2 file.
> 
> Is it possible to have a qcow2 file that stores JUST dirty bitmap(s) and
> no guest data (that is, no L1 table, no backing file)?  It might make
> sense, if we intend to allow persistent bitmap files that can be
> associated with a raw disk.  But right now, the spec seems to require
> that l1_table_offset must be non-zero.
> 

To my knowledge, this is a consequence of the decision that bitmaps do
not have to be related to the data contained within the .qcow2.

Ultimately, it's fine (for our purposes) there's no data in the qcow2. I
have been testing with a 0-length qcow2 to test this patchset.

--js

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-08-27  7:45     ` Vladimir Sementsov-Ogievskiy
  2015-08-31 11:06       ` Vladimir Sementsov-Ogievskiy
@ 2015-08-31 22:39       ` Eric Blake
  2015-08-31 22:50         ` Eric Blake
  1 sibling, 1 reply; 76+ messages in thread
From: Eric Blake @ 2015-08-31 22:39 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha,
	pbonzini, den, jsnow

[-- Attachment #1: Type: text/plain, Size: 2860 bytes --]

On 08/27/2015 01:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 09.06.2015 18:50, Stefan Hajnoczi wrote:
>> On Mon, Jun 08, 2015 at 06:21:24PM +0300, Vladimir Sementsov-Ogievskiy
>> wrote:
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 406e55d..f85a55a 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -182,6 +182,14 @@ static int
>>> qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>>                   return ret;
>>>               }
>>>   +            if (!(s->autoclear_features &
>>> QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
>>> +                s->nb_dirty_bitmaps > 0) {
>>> +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
>>> +                if (ret < 0) {
>>> +                    return ret;
>>> +                }
>>> +            }
>>> +
>> What if the file is read-only?
>>
>> We shouldn't modify the file in qcow2_read_extensions().
> But where? In qcow2_open? Or nowhere? I think auto clear extensions
> should be cleared automatically..

Autoclear bits should be cleared ONLY when opening a file for writing,
and ONLY if the version of qemu[-img] opening the file does not
recognize what the bit controls (or if it does recognize the bit, but is
about to perform a semantic action that violates what the bit represents).

We should already be clearing all unrecognized autoclear bits upon
opening a file for writing (if not, that's a bug in the current
implementation), when executing older qemu[-img].  And after your patch
series, we know how to handle dirty bitmaps, so the dirty bitmap
autoclear bit should no longer be cleared automatically (it is no longer
in the mask of unrecognized autoclear bits).  So all we have to do now
that we are new-enough qemu[-img] is:

1. be sure to set the autoclear bit any time we write a dirty bitmap
(the image can no longer be safely written by an older qemu[-img],
because those older executables don't know how to interpret the dirty
bitmap extension header and might try to overwrite a cluster that we
have tied up in a dirty bitmap)

2. clear the bit if we are removing the last dirty bitmap from an image
(optimization that is not strictly necessary; but lets older qemu[-img]
once again be able to write to the file without the risk of corrupting it)

3. add in error reporting in case the autoclear bit is clear but the
dirty bitmap header extension is present with a non-zero number of
bitmaps (the autoclear bit served its purpose: an older qemu[-img] has
opened the file for writing since new qemu last handled it, and may have
broken our bitmaps)

Since opening a file read-only cannot (further) corrupt the image, it
also does not need to clear any autoclear bits.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps
  2015-08-31 22:39       ` Eric Blake
@ 2015-08-31 22:50         ` Eric Blake
  0 siblings, 0 replies; 76+ messages in thread
From: Eric Blake @ 2015-08-31 22:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Stefan Hajnoczi
  Cc: kwolf, qemu-devel, Vladimir Sementsov-Ogievskiy, stefanha, den,
	pbonzini, jsnow

[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]

On 08/31/2015 04:39 PM, Eric Blake wrote:

>>>> +++ b/block/qcow2.c
>>>> @@ -182,6 +182,14 @@ static int
>>>> qcow2_read_extensions(BlockDriverState *bs, uint64_t start_offset,
>>>>                   return ret;
>>>>               }
>>>>   +            if (!(s->autoclear_features &
>>>> QCOW2_AUTOCLEAR_DIRTY_BITMAPS) &&
>>>> +                s->nb_dirty_bitmaps > 0) {
>>>> +                ret = qcow2_delete_all_dirty_bitmaps(bs, errp);
>>>> +                if (ret < 0) {
>>>> +                    return ret;
>>>> +                }
>>>> +            }
>>>> +
>>> What if the file is read-only?
>>>
>>> We shouldn't modify the file in qcow2_read_extensions().
>> But where? In qcow2_open? Or nowhere? I think auto clear extensions
>> should be cleared automatically..
> 

> 3. add in error reporting in case the autoclear bit is clear but the
> dirty bitmap header extension is present with a non-zero number of
> bitmaps (the autoclear bit served its purpose: an older qemu[-img] has
> opened the file for writing since new qemu last handled it, and may have
> broken our bitmaps)

This code is attempting to do the error recovery if an older qemu opened
the file for writing and thus cleared the unknown bit. But silently
dropping the probably-corrupt bitmaps is not nice; an error message
would be nicer, as well as requiring an explicit 'qemu-img check -r' as
the way to recover the space occupied by the bitmaps.

And thinking about it a bit more, I wonder if we should (independently)
add a new safety flag into qemu and/or qemu-img, which allows the user
the option to refuse to open a file read-write if the file contains an
autoclear flag that is not recognized, rather than the current default
of opening the file anyways and clearing the bit.  The default behavior
is safe but may cause data loss (where presumably the lost data is not
that important, or we would have made it an incompatible feature bit
instead of an autoclear bit), so the safety flag would give users a bit
more control on whether they are okay with modifying a file knowing that
the modifications will clear the feature.  But I guess 'qemu-img info'
already knows how to report unknown autoclear bits (thanks to the
feature name table extension header) in a read-only manner, so it is
already possible to do a read-only probe of a file to see if it contains
unknown autoclear bits before doing a read-write open; and maybe we
don't need a safety flag after all.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2015-08-31 22:51 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-08 15:21 [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
2015-06-08 15:21 ` [Qemu-devel] [PATCH 1/8] spec: add qcow2-dirty-bitmaps specification Vladimir Sementsov-Ogievskiy
2015-06-09 16:01   ` John Snow
2015-06-09 17:03   ` Stefan Hajnoczi
2015-06-10  8:19     ` Vladimir Sementsov-Ogievskiy
2015-06-10  8:49       ` Vladimir Sementsov-Ogievskiy
2015-06-10 13:00       ` Eric Blake
2015-06-11 10:16         ` Vladimir Sementsov-Ogievskiy
2015-06-10 13:24       ` Stefan Hajnoczi
2015-06-11 10:19         ` Vladimir Sementsov-Ogievskiy
2015-06-11 13:03           ` Stefan Hajnoczi
2015-06-11 16:21             ` John Snow
2015-06-12 10:28               ` Stefan Hajnoczi
2015-06-12 15:19                 ` John Snow
2015-06-10 15:34   ` Kevin Wolf
2015-06-11 10:25     ` Vladimir Sementsov-Ogievskiy
2015-06-11 16:30       ` John Snow
2015-06-12  8:33         ` Kevin Wolf
2015-08-24 10:46     ` Vladimir Sementsov-Ogievskiy
2015-08-24 13:30   ` Vladimir Sementsov-Ogievskiy
2015-08-24 14:08     ` Vladimir Sementsov-Ogievskiy
2015-08-24 14:04   ` Vladimir Sementsov-Ogievskiy
2015-08-31 22:21   ` Eric Blake
2015-08-31 22:24     ` John Snow
2015-06-08 15:21 ` [Qemu-devel] [PATCH 2/8] qcow2: add dirty-bitmaps feature Vladimir Sementsov-Ogievskiy
2015-06-09 16:52   ` Stefan Hajnoczi
2015-06-10 14:30   ` Stefan Hajnoczi
2015-06-12 19:02     ` John Snow
2015-06-15 14:42       ` Stefan Hajnoczi
2015-06-23 17:57         ` John Snow
2015-06-24  9:39           ` Stefan Hajnoczi
2015-08-14 17:14     ` Vladimir Sementsov-Ogievskiy
2015-08-26  9:09       ` Stefan Hajnoczi
2015-06-11 23:04   ` John Snow
2015-06-15 14:05     ` Vladimir Sementsov-Ogievskiy
2015-06-15 16:53       ` John Snow
2015-06-12 21:55   ` John Snow
2015-08-26 13:15     ` Vladimir Sementsov-Ogievskiy
2015-08-26 14:14       ` Vladimir Sementsov-Ogievskiy
2015-08-27 12:43   ` Vladimir Sementsov-Ogievskiy
2015-06-08 15:21 ` [Qemu-devel] [PATCH 3/8] block: store persistent dirty bitmaps Vladimir Sementsov-Ogievskiy
2015-06-08 15:21 ` [Qemu-devel] [PATCH 4/8] block: add bdrv_load_dirty_bitmap Vladimir Sementsov-Ogievskiy
2015-06-09 16:01   ` Stefan Hajnoczi
2015-06-10 22:33     ` John Snow
2015-06-11 10:41       ` Vladimir Sementsov-Ogievskiy
2015-06-08 15:21 ` [Qemu-devel] [PATCH 5/8] qcow2: add qcow2_dirty_bitmap_delete_all Vladimir Sementsov-Ogievskiy
2015-06-08 15:21 ` [Qemu-devel] [PATCH 6/8] qcow2: add autoclear bit for dirty bitmaps Vladimir Sementsov-Ogievskiy
2015-06-09 15:49   ` Stefan Hajnoczi
2015-06-09 15:50   ` Stefan Hajnoczi
2015-08-27  7:45     ` Vladimir Sementsov-Ogievskiy
2015-08-31 11:06       ` Vladimir Sementsov-Ogievskiy
2015-08-31 22:39       ` Eric Blake
2015-08-31 22:50         ` Eric Blake
2015-06-10 23:42   ` John Snow
2015-06-11  8:35     ` Kevin Wolf
2015-06-11 10:49     ` Vladimir Sementsov-Ogievskiy
2015-06-11 16:36       ` John Snow
2015-06-08 15:21 ` [Qemu-devel] [PATCH 7/8] qemu: command line option " Vladimir Sementsov-Ogievskiy
2015-06-11 20:57   ` John Snow
2015-06-12 21:49   ` John Snow
2015-06-08 15:21 ` [Qemu-devel] [PATCH 8/8] iotests: test internal persistent dirty bitmap Vladimir Sementsov-Ogievskiy
2015-06-09 16:17   ` Eric Blake
2015-06-10 15:27 ` [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps Stefan Hajnoczi
2015-06-11 11:22   ` Vladimir Sementsov-Ogievskiy
2015-06-11 13:14     ` Stefan Hajnoczi
2015-06-11 20:06 ` Stefan Hajnoczi
2015-06-12  9:58   ` Denis V. Lunev
2015-06-12 10:36     ` Stefan Hajnoczi
2015-08-26  6:26       ` Vladimir Sementsov-Ogievskiy
2015-08-26  9:13         ` Stefan Hajnoczi
2015-06-12 19:34 ` John Snow
2015-06-17 14:29   ` Vladimir Sementsov-Ogievskiy
2015-06-24  0:21     ` John Snow
2015-07-08 12:24       ` Vladimir Sementsov-Ogievskiy
2015-07-08 15:21         ` John Snow
2015-08-27 10:08       ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.