From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55338)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1Z3Uiu-00076w-6y
	for qemu-devel@nongnu.org; Fri, 12 Jun 2015 15:34:42 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1Z3Uip-0006OK-Ve
	for qemu-devel@nongnu.org; Fri, 12 Jun 2015 15:34:40 -0400
Received: from mx1.redhat.com ([209.132.183.28]:45838)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1Z3Uip-0006OD-L8
	for qemu-devel@nongnu.org; Fri, 12 Jun 2015 15:34:35 -0400
Message-ID: <557B3449.4090301@redhat.com>
Date: Fri, 12 Jun 2015 15:34:33 -0400
From: John Snow <jsnow@redhat.com>
MIME-Version: 1.0
References: <1433776886-27239-1-git-send-email-vsementsov@virtuozzo.com>
In-Reply-To: <1433776886-27239-1-git-send-email-vsementsov@virtuozzo.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, pbonzini@redhat.com, Fam Zheng <famz@redhat.com>, stefanha@redhat.com, den@openvz.org


On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
> v2:
>  - rebase on my 'Dirty bitmaps migration' series
>  - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for
>    testing like with dirty bitmaps migration
>  - autoclean features
>=20
> v1:
>=20
> The bitmaps are saved into qcow2 file format. It provides both
> 'internal' and 'external' dirty bitmaps feature:
>  - for qcow2 drives we can store bitmaps in the same file
>  - for other formats we can store bitmaps in the separate qcow2 file
>=20
> QCow2 header is extended by fields 'nb_dirty_bitmaps' and
> 'dirty_bitmaps_offset' like with snapshots.
>=20
> Proposed command line syntax is the following:
>=20
> -dirty-bitmap [option1=3Dval1][,option2=3Dval2]...
>     Available options are:
>     name         The name for the bitmap (necessary).
>=20
>     file         The file to load the bitmap from.
>=20
>     file_id      When specified with 'file' option, then this file will
>                  be available through this id for other -dirty-bitmap
>                  options when specified without 'file' option, then it
>                  is a reference to 'file', specified with another
>                  -dirty-bitmap option, and it will be used to load the
>                  bitmap from.
>=20
>     drive        The drive to bind the bitmap to. It should be specifie=
d
>                  as 'id' suboption of one of -drive options. If nor
>                  'file' neither 'file_id' are specified, then the bitma=
p
>                  will be loaded from that drive (internal dirty bitmap)=
.
>=20
>     granularity  The granularity for the bitmap. Not necessary, the
>                  default value may be used.
>=20
>     enabled      on|off. Default is 'on'. Disabled bitmaps are not
>                  changing regardless of writes to corresponding drive.
>=20
> Examples:
>=20
> qemu -drive file=3Da.qcow2,id=3Ddisk -dirty-bitmap name=3Db,drive=3Ddis=
k
> qemu -drive file=3Da.raw,id=3Ddisk \
>      -dirty-bitmap name=3Db,drive=3Ddisk,file=3Db.qcow2,enabled=3Doff
>=20
> Vladimir Sementsov-Ogievskiy (8):
>   spec: add qcow2-dirty-bitmaps specification
>   qcow2: add dirty-bitmaps feature
>   block: store persistent dirty bitmaps
>   block: add bdrv_load_dirty_bitmap
>   qcow2: add qcow2_dirty_bitmap_delete_all
>   qcow2: add autoclear bit for dirty bitmaps
>   qemu: command line option for dirty bitmaps
>   iotests: test internal persistent dirty bitmap
>=20
>  block.c                       |  82 +++++++
>  block/Makefile.objs           |   2 +-
>  block/qcow2-dirty-bitmap.c    | 537 ++++++++++++++++++++++++++++++++++=
++++++++
>  block/qcow2.c                 |  69 +++++-
>  block/qcow2.h                 |  61 +++++
>  blockdev.c                    |  38 +++
>  docs/specs/qcow2.txt          |  66 ++++++
>  include/block/block.h         |   9 +
>  include/block/block_int.h     |  10 +
>  include/sysemu/blockdev.h     |   1 +
>  include/sysemu/sysemu.h       |   1 +
>  qemu-options.hx               |  37 +++
>  tests/qemu-iotests/118        |  83 +++++++
>  tests/qemu-iotests/118.out    |   5 +
>  tests/qemu-iotests/group      |   1 +
>  tests/qemu-iotests/iotests.py |   6 +
>  vl.c                          | 100 ++++++++
>  17 files changed, 1105 insertions(+), 3 deletions(-)
>  create mode 100644 block/qcow2-dirty-bitmap.c
>  create mode 100755 tests/qemu-iotests/118
>  create mode 100644 tests/qemu-iotests/118.out
>=20

Well, you said "RFC" ... So here's some "C" that you RF'd.

Many of these points are a "wish list" of sorts and don't necessarily
have to be implemented all at once, but we should be careful to design
the core series with the later additions in mind.

Many of these items are things that I wouldn't mind working on
(Primarily the QMP interfaces), provided that the core of this series
will allow for them to exist. I can take many of the QMP/transaction
interface projects, for instance.

I'm starting to think we won't be able to squeeze this in for 2.4, but
we can have a bulk of the work well underway for 2.5, by which point I
am hopeful that libvirt will be beginning to pick up motion for
integration of this feature.

I think that the basic approach you have so far is good, we just have to
plan out our required extensions and then we can review the base to make
sure it supports the features we want in the near future.


(1) General storage design

- Persistence bitmaps can be stored in any arbitrary qcow2 file,
regardless of if that qcow2 holds data or not.

- Any given qcow2 file with or without data can hold bitmaps intended
for any number of other drives.

- Dirty bitmaps are not assumed to be able to be stored in any
particular location.

So far, this is good. I like the flexibility this provides. This lets us
do all kinds of cool things like store bitmaps for 20 different raw
drives inside of a single 'bitmaps.qcow2' if we wish.


(2) Bitmaps added via QMP do not get any persistence attributes.

This is something we'll need to change. Existing QMP commands that let
us modify bitmaps:

block-dirty-bitmap-add		[+transaction]
block-dirty-bitmap-remove
block-dirty-bitmap-clear	[+transaction]

- block-dirty-bitmap-add:

We will want the ability for bitmap-add to specify a persistence option.
What I am less clear on is what this attribute should look like.

should we add target: <filename> as an attribute here,
or should it be target: <node> to specify the file object that we want
to store this bitmap in? Or perhaps both?:

mode: file, target: <filename>
mode: node, target: <node>

Or even an explicit usability feature that lets us specify that we wish
to store the bitmap for the drive we're attaching it to:

block-dirty-bitmap=3Dadd node=3Ddrive0 name=3Dbitmap0 mode=3Dself

The implication here is that the default value for persist could be
"none", which does not attempt to store this bitmap anywhere.

- block-dirty-bitmap-remove

If we remove a bitmap with persistence options active, it needs to be
cleared out of the file it is being stored in. Currently we use
"release" to remove a bitmap, which deletes only the in-memory portion
of the bitmap, so you also use release in your series to delete
in-memory bitmaps after we're done with them.

I think the semantics of the "remove" QMP option here, however, should
include a call to the storage layer to remove the bitmap in question.

Let's split the "release" function into two functions:
(A) bdrv_dirty_bitmap_free (which just frees the in-memory bits)
(B) bdrv_dirty_bitmap_delete (which relies on _free but deletes from
disk also.)

Then bdrv_close can use bitmap_free, but the QMP remove command can
utilize _delete.

- block-dirty-bitmap-clear:

This needs to clear the bitmap on-disk if it has persistence features
active.

- block-dirty-bitmap-copy:

This is only a proposal currently, but worth us keeping it in mind. We
should decide on copy semantics. Should the copy keep the persistence
attributes of the source bitmap by default and allow a user to override
it if desired, or should we force the persistence attribute back to
null/None until the user overrides?

I suspect defaulting it to no persistence is probably the sanest until
we're told otherwise (either via an extension to the copy command or a
later edit command.)

Since the QMP interfaces has been my area so far, I can draft their
addition as a new series if you'd like.


(3) Additional QMP interfaces

We should add the ability to modify a bitmap's persistence after it has
been added.

block-dirty-bitmap-edit mode=3D<file,node,self,none> target=3D<...>

This will allow us to add persistence to a bitmap after creation, or
remove persistence from a bitmap without deleting it if it's no longer
desired.

Perhaps at a later date we could even have it change where the bitmap is
stored through this mechanism.

(Usability features might include the ability for us to rename or change
the granularity of the bitmap, too -- but that's future usability stuff,
not core functionality.)

Like the above, I can draft this addition.


(4) Storage Format

I think overall the bitmap extension headers look sane, but Kevin is the
ultimate authority here.

I /would/ like to see an additional header bitfield reserved
for some arbitrary flags that can be used at a later date. A uint32_t
should be sufficient for now, with some of the upper bits reserved
either for an extension or a version field to allow us to expand the
bitmap headers in the future if necessary.


(5) Bitmap autoloading

Bitmaps are not currently automatically loaded if you pass e.g. (-hda
my_drive_that_also_has_bitmaps.qcow2). This is in part because the drive
a bitmap was intended for is not information stored with the bitmap, so
QEMU has no concept or ability to be able to "auto load" bitmaps.

Hinted at earlier by my desire to see something like mode=3Dself, we
should add some flags to the dirty bitmap header stored with each bitmap:

0x01: "This bitmap describes the file it is stored in"
0x02: "This bitmap should be auto-loaded when this file is opened."
0x04: "This bitmap is read-only (disabled.)"

This way, with a properly modern version of QEMU, you could simply just:

qemu -M q35 -enable-kvm -hda windows10.qcow2

and if there were bitmaps inside of windows10.qcow2 that had 0x01 and
0x02 set, you'd get those bitmaps loaded before any IO to the data
clusters of the .qcow2, ensuring data integrity.

Of course, I think that it is currently too complicated to try to
accomplish autoloading of bitmaps for *other* drives, so let's not worry
about that now. This means 0x02 set without 0x01 would be an error.

Of course, when autoloading bitmaps, we'll have to check that the size
of the bitmap matches the size of the drive. This is easy to do, though.

The 0x01 bit can be set automatically when that circumstance is
detected, and 0x02 can be set perhaps as an option to
--dirty-bitmap auto=3Dyes
or via the QMP
block-dirty-bitmap-add ... auto=3Dyes
or via the edit command,
block-dirty-bitmap-edit ... auto=3Dyes

Maybe we could also set it implicitly if mode=3Dself is used, too.


(6) qemu-img interface

Stefan has mentioned that it would be nice to implement a query ability
to qemu-img to list bitmaps stored in qcow2 files, along with some of
their key attributes. size, granularity, any flags. It's probably not
efficient to list the dirty count, unless we begin storing that
information manually in the header. I don't think there's a strong need
for that level of info, though.

I can handle this part, if you'd like.


(7) CLI interface

- The only way to get a bitmap loaded into memory from file is to use
the --dirty-bitmap argument where you specify the name, file,
destination drive, and granularity.

- The only way to create a new bitmap that will integrate with the
persistence features is to specify a new bitmap that does not currently
exist within a file and allow the qcow2 layer to create the in-memory
bitmap for us.

This helps us with the flexibility that makes this design a winning
choice overall, but it's cumbersome for some special common cases I
think we should be supporting.

As mentioned previously, I think granularity should not be part
of the lookup process -- just creation, and even then I think this CLI
syntax should not automatically create bitmaps if it wasn't found -- if
the user didn't intend to make a bitmap, an error is likely more
appropriate.

Perhaps --dirty-bitmap create=3Dtrue,[...] would be sufficient for
specifying intent here, at which point granularity makes sense for the
creation process.

As for the granularity, I think this should be appropriate:

--dirty-bitmap file=3Dbitmaps.qcow2,name=3Dbitmap0,drive=3Ddrive0

And that should be sufficient to look in bitmaps.qcow2, find 'bitmap0',
and attach it to 'drive0', throwing an error if the sizes don't match.


(8) Namespaces

Stefan also asked me about the bitmap namespaces -- in-memory of course,
each node can have their own "bitmap0" without any collisions because
all bitmaps are always referred to by their (bs,name) pair.

How do we address bitmaps inside a file, though?

If any given bitmap containing .qcow2 file can store an arbitrary number
of bitmaps intended for an arbitrary number of destinations, how do we
handle this?

EXAMPLE:
-dirty-bitmap name=3Dbitmap0,drive=3Ddrive0,file=3Dbitmaps.qcow2
-dirty-bitmap name=3Dbitmap0,drive=3Ddrive1,file=3Dbitmaps.qcow2

I think this might currently do very funky things, if bitmaps.qcow2 is
currently empty -- I think both calls will succeed, but it will fail
later when it tries to store them and cannot.

I think we need to do one of two things:

(A) Keep the namespace inside of a .qcow2 file as it is now, but ALWAYS
check up front if a bitmap *can* be added to the file. This way we don't
run into problems after we've dirtied the bitmap.

(B) Find a way to accommodate bitmaps with the same names that were
intended for different nodes.

I don't have a good idea for #2, so I think #1 is probably the way to
go. We can amend the bitmap documentation to specify that although the
bitmap names are unique per-node, if you want to store them in the same
file, you're going to want to give them globally unique names.


(9) Data consistency

We need to discuss the data safety element to this. I think that
atomically before the first write is flushed to disk, the dirty bitmap
needs to *at least* set a bit in the bitmap header that indicates that
the bitmap is no longer up-to-date.

When the bitmap is later flushed to disk, that bit can be cleared until
the next write occurs, which repeats the process.

We have discussed this (long ago) in the past, but one of the ideas was
to monitor the relative utilization rate of the disk and attempt to
flush the bitmap whenever there was a lull in disk IO, then clear the
"inconsistent" bit.

On close, the flush of data and bitmap both would lead us to clear this
bit as well.

Upon boot, if the inconsistent bit was set, we'd know that the bitmap
was outdated and we'd have to recommend that the bitmap be cleared and a
new bitmap started.

(Or, perhaps, a data-intensive mode where we compare the current data
mode with the most recent incremental backup to re-determine what data
has changed. This would be very, very slow but an option at least for
recovery if started a new full backup is even less desirable.)

Other ideas involve regularly flushing the bitmap at certain timed
intervals, certain usage intervals (e.g. when the changed bitmap data
reaches some total size, like 64KiB of changed bits), or a combination
of regular intervals with "opportunistic" flushing during Disk IO lulls.

This is a key feature that absolutely needs to make it into the base
series, IMO.


(10) Storage Efficiency

We should discuss the usage of meta bitmaps or ancillary bitmaps to
record which parts of our bitmap data need to be flushed to disk in
order to reduce flush/close time.

The current meta bitmap implementation optimizes for 1KiB writes to the
network (which fits well under the standard 1500bytes), but perhaps we
could optimize for local storage block size and use this to be stingy
about how much data we decide to write to disk.

I believe this is another feature that should be included in the initial
series as well, because it might radically impact the core design.


(11) Migration

Stefan already touched on this, but we should be mindful of the
different kinds of migration scenarios.

We might migrate the disks, or they might be shared already.

We might migrate (or share) a disk, but what happens if we didn't
migrate or didn't share the bitmap storage file that we were using?

Bitmaps without persistence data will migrate just fine, but how do we
intend to migrate the persistence data itself? I suppose as a first pass
we can just tap into the migration calls and migrate some properties like=
:

"This bitmap relies on node_id=3Dxxxx to save its bitmap"

and that should probably work for either kind of storage migration
tactic. The only problem would be nodes without IDs that we opened by
filename ...

...Another technique would be for any bitmap that is persistent is to
store them all first prior to migration and then allow the destination
to load them anew. This would also work for either shared or migrated
storage if we worked it right.

It seems a little hairy, and I don't have the answers right now...
Something I will ponder on the weekend.