From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55338) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3Uiu-00076w-6y for qemu-devel@nongnu.org; Fri, 12 Jun 2015 15:34:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z3Uip-0006OK-Ve for qemu-devel@nongnu.org; Fri, 12 Jun 2015 15:34:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45838) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3Uip-0006OD-L8 for qemu-devel@nongnu.org; Fri, 12 Jun 2015 15:34:35 -0400 Message-ID: <557B3449.4090301@redhat.com> Date: Fri, 12 Jun 2015 15:34:33 -0400 From: John Snow MIME-Version: 1.0 References: <1433776886-27239-1-git-send-email-vsementsov@virtuozzo.com> In-Reply-To: <1433776886-27239-1-git-send-email-vsementsov@virtuozzo.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladimir Sementsov-Ogievskiy , qemu-devel@nongnu.org Cc: kwolf@redhat.com, pbonzini@redhat.com, Fam Zheng , stefanha@redhat.com, den@openvz.org On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote: > v2: > - rebase on my 'Dirty bitmaps migration' series > - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for > testing like with dirty bitmaps migration > - autoclean features >=20 > v1: >=20 > The bitmaps are saved into qcow2 file format. It provides both > 'internal' and 'external' dirty bitmaps feature: > - for qcow2 drives we can store bitmaps in the same file > - for other formats we can store bitmaps in the separate qcow2 file >=20 > QCow2 header is extended by fields 'nb_dirty_bitmaps' and > 'dirty_bitmaps_offset' like with snapshots. >=20 > Proposed command line syntax is the following: >=20 > -dirty-bitmap [option1=3Dval1][,option2=3Dval2]... > Available options are: > name The name for the bitmap (necessary). >=20 > file The file to load the bitmap from. >=20 > file_id When specified with 'file' option, then this file will > be available through this id for other -dirty-bitmap > options when specified without 'file' option, then it > is a reference to 'file', specified with another > -dirty-bitmap option, and it will be used to load the > bitmap from. >=20 > drive The drive to bind the bitmap to. It should be specifie= d > as 'id' suboption of one of -drive options. If nor > 'file' neither 'file_id' are specified, then the bitma= p > will be loaded from that drive (internal dirty bitmap)= . >=20 > granularity The granularity for the bitmap. Not necessary, the > default value may be used. >=20 > enabled on|off. Default is 'on'. Disabled bitmaps are not > changing regardless of writes to corresponding drive. >=20 > Examples: >=20 > qemu -drive file=3Da.qcow2,id=3Ddisk -dirty-bitmap name=3Db,drive=3Ddis= k > qemu -drive file=3Da.raw,id=3Ddisk \ > -dirty-bitmap name=3Db,drive=3Ddisk,file=3Db.qcow2,enabled=3Doff >=20 > Vladimir Sementsov-Ogievskiy (8): > spec: add qcow2-dirty-bitmaps specification > qcow2: add dirty-bitmaps feature > block: store persistent dirty bitmaps > block: add bdrv_load_dirty_bitmap > qcow2: add qcow2_dirty_bitmap_delete_all > qcow2: add autoclear bit for dirty bitmaps > qemu: command line option for dirty bitmaps > iotests: test internal persistent dirty bitmap >=20 > block.c | 82 +++++++ > block/Makefile.objs | 2 +- > block/qcow2-dirty-bitmap.c | 537 ++++++++++++++++++++++++++++++++++= ++++++++ > block/qcow2.c | 69 +++++- > block/qcow2.h | 61 +++++ > blockdev.c | 38 +++ > docs/specs/qcow2.txt | 66 ++++++ > include/block/block.h | 9 + > include/block/block_int.h | 10 + > include/sysemu/blockdev.h | 1 + > include/sysemu/sysemu.h | 1 + > qemu-options.hx | 37 +++ > tests/qemu-iotests/118 | 83 +++++++ > tests/qemu-iotests/118.out | 5 + > tests/qemu-iotests/group | 1 + > tests/qemu-iotests/iotests.py | 6 + > vl.c | 100 ++++++++ > 17 files changed, 1105 insertions(+), 3 deletions(-) > create mode 100644 block/qcow2-dirty-bitmap.c > create mode 100755 tests/qemu-iotests/118 > create mode 100644 tests/qemu-iotests/118.out >=20 Well, you said "RFC" ... So here's some "C" that you RF'd. Many of these points are a "wish list" of sorts and don't necessarily have to be implemented all at once, but we should be careful to design the core series with the later additions in mind. Many of these items are things that I wouldn't mind working on (Primarily the QMP interfaces), provided that the core of this series will allow for them to exist. I can take many of the QMP/transaction interface projects, for instance. I'm starting to think we won't be able to squeeze this in for 2.4, but we can have a bulk of the work well underway for 2.5, by which point I am hopeful that libvirt will be beginning to pick up motion for integration of this feature. I think that the basic approach you have so far is good, we just have to plan out our required extensions and then we can review the base to make sure it supports the features we want in the near future. (1) General storage design - Persistence bitmaps can be stored in any arbitrary qcow2 file, regardless of if that qcow2 holds data or not. - Any given qcow2 file with or without data can hold bitmaps intended for any number of other drives. - Dirty bitmaps are not assumed to be able to be stored in any particular location. So far, this is good. I like the flexibility this provides. This lets us do all kinds of cool things like store bitmaps for 20 different raw drives inside of a single 'bitmaps.qcow2' if we wish. (2) Bitmaps added via QMP do not get any persistence attributes. This is something we'll need to change. Existing QMP commands that let us modify bitmaps: block-dirty-bitmap-add [+transaction] block-dirty-bitmap-remove block-dirty-bitmap-clear [+transaction] - block-dirty-bitmap-add: We will want the ability for bitmap-add to specify a persistence option. What I am less clear on is what this attribute should look like. should we add target: as an attribute here, or should it be target: to specify the file object that we want to store this bitmap in? Or perhaps both?: mode: file, target: mode: node, target: Or even an explicit usability feature that lets us specify that we wish to store the bitmap for the drive we're attaching it to: block-dirty-bitmap=3Dadd node=3Ddrive0 name=3Dbitmap0 mode=3Dself The implication here is that the default value for persist could be "none", which does not attempt to store this bitmap anywhere. - block-dirty-bitmap-remove If we remove a bitmap with persistence options active, it needs to be cleared out of the file it is being stored in. Currently we use "release" to remove a bitmap, which deletes only the in-memory portion of the bitmap, so you also use release in your series to delete in-memory bitmaps after we're done with them. I think the semantics of the "remove" QMP option here, however, should include a call to the storage layer to remove the bitmap in question. Let's split the "release" function into two functions: (A) bdrv_dirty_bitmap_free (which just frees the in-memory bits) (B) bdrv_dirty_bitmap_delete (which relies on _free but deletes from disk also.) Then bdrv_close can use bitmap_free, but the QMP remove command can utilize _delete. - block-dirty-bitmap-clear: This needs to clear the bitmap on-disk if it has persistence features active. - block-dirty-bitmap-copy: This is only a proposal currently, but worth us keeping it in mind. We should decide on copy semantics. Should the copy keep the persistence attributes of the source bitmap by default and allow a user to override it if desired, or should we force the persistence attribute back to null/None until the user overrides? I suspect defaulting it to no persistence is probably the sanest until we're told otherwise (either via an extension to the copy command or a later edit command.) Since the QMP interfaces has been my area so far, I can draft their addition as a new series if you'd like. (3) Additional QMP interfaces We should add the ability to modify a bitmap's persistence after it has been added. block-dirty-bitmap-edit mode=3D target=3D<...> This will allow us to add persistence to a bitmap after creation, or remove persistence from a bitmap without deleting it if it's no longer desired. Perhaps at a later date we could even have it change where the bitmap is stored through this mechanism. (Usability features might include the ability for us to rename or change the granularity of the bitmap, too -- but that's future usability stuff, not core functionality.) Like the above, I can draft this addition. (4) Storage Format I think overall the bitmap extension headers look sane, but Kevin is the ultimate authority here. I /would/ like to see an additional header bitfield reserved for some arbitrary flags that can be used at a later date. A uint32_t should be sufficient for now, with some of the upper bits reserved either for an extension or a version field to allow us to expand the bitmap headers in the future if necessary. (5) Bitmap autoloading Bitmaps are not currently automatically loaded if you pass e.g. (-hda my_drive_that_also_has_bitmaps.qcow2). This is in part because the drive a bitmap was intended for is not information stored with the bitmap, so QEMU has no concept or ability to be able to "auto load" bitmaps. Hinted at earlier by my desire to see something like mode=3Dself, we should add some flags to the dirty bitmap header stored with each bitmap: 0x01: "This bitmap describes the file it is stored in" 0x02: "This bitmap should be auto-loaded when this file is opened." 0x04: "This bitmap is read-only (disabled.)" This way, with a properly modern version of QEMU, you could simply just: qemu -M q35 -enable-kvm -hda windows10.qcow2 and if there were bitmaps inside of windows10.qcow2 that had 0x01 and 0x02 set, you'd get those bitmaps loaded before any IO to the data clusters of the .qcow2, ensuring data integrity. Of course, I think that it is currently too complicated to try to accomplish autoloading of bitmaps for *other* drives, so let's not worry about that now. This means 0x02 set without 0x01 would be an error. Of course, when autoloading bitmaps, we'll have to check that the size of the bitmap matches the size of the drive. This is easy to do, though. The 0x01 bit can be set automatically when that circumstance is detected, and 0x02 can be set perhaps as an option to --dirty-bitmap auto=3Dyes or via the QMP block-dirty-bitmap-add ... auto=3Dyes or via the edit command, block-dirty-bitmap-edit ... auto=3Dyes Maybe we could also set it implicitly if mode=3Dself is used, too. (6) qemu-img interface Stefan has mentioned that it would be nice to implement a query ability to qemu-img to list bitmaps stored in qcow2 files, along with some of their key attributes. size, granularity, any flags. It's probably not efficient to list the dirty count, unless we begin storing that information manually in the header. I don't think there's a strong need for that level of info, though. I can handle this part, if you'd like. (7) CLI interface - The only way to get a bitmap loaded into memory from file is to use the --dirty-bitmap argument where you specify the name, file, destination drive, and granularity. - The only way to create a new bitmap that will integrate with the persistence features is to specify a new bitmap that does not currently exist within a file and allow the qcow2 layer to create the in-memory bitmap for us. This helps us with the flexibility that makes this design a winning choice overall, but it's cumbersome for some special common cases I think we should be supporting. As mentioned previously, I think granularity should not be part of the lookup process -- just creation, and even then I think this CLI syntax should not automatically create bitmaps if it wasn't found -- if the user didn't intend to make a bitmap, an error is likely more appropriate. Perhaps --dirty-bitmap create=3Dtrue,[...] would be sufficient for specifying intent here, at which point granularity makes sense for the creation process. As for the granularity, I think this should be appropriate: --dirty-bitmap file=3Dbitmaps.qcow2,name=3Dbitmap0,drive=3Ddrive0 And that should be sufficient to look in bitmaps.qcow2, find 'bitmap0', and attach it to 'drive0', throwing an error if the sizes don't match. (8) Namespaces Stefan also asked me about the bitmap namespaces -- in-memory of course, each node can have their own "bitmap0" without any collisions because all bitmaps are always referred to by their (bs,name) pair. How do we address bitmaps inside a file, though? If any given bitmap containing .qcow2 file can store an arbitrary number of bitmaps intended for an arbitrary number of destinations, how do we handle this? EXAMPLE: -dirty-bitmap name=3Dbitmap0,drive=3Ddrive0,file=3Dbitmaps.qcow2 -dirty-bitmap name=3Dbitmap0,drive=3Ddrive1,file=3Dbitmaps.qcow2 I think this might currently do very funky things, if bitmaps.qcow2 is currently empty -- I think both calls will succeed, but it will fail later when it tries to store them and cannot. I think we need to do one of two things: (A) Keep the namespace inside of a .qcow2 file as it is now, but ALWAYS check up front if a bitmap *can* be added to the file. This way we don't run into problems after we've dirtied the bitmap. (B) Find a way to accommodate bitmaps with the same names that were intended for different nodes. I don't have a good idea for #2, so I think #1 is probably the way to go. We can amend the bitmap documentation to specify that although the bitmap names are unique per-node, if you want to store them in the same file, you're going to want to give them globally unique names. (9) Data consistency We need to discuss the data safety element to this. I think that atomically before the first write is flushed to disk, the dirty bitmap needs to *at least* set a bit in the bitmap header that indicates that the bitmap is no longer up-to-date. When the bitmap is later flushed to disk, that bit can be cleared until the next write occurs, which repeats the process. We have discussed this (long ago) in the past, but one of the ideas was to monitor the relative utilization rate of the disk and attempt to flush the bitmap whenever there was a lull in disk IO, then clear the "inconsistent" bit. On close, the flush of data and bitmap both would lead us to clear this bit as well. Upon boot, if the inconsistent bit was set, we'd know that the bitmap was outdated and we'd have to recommend that the bitmap be cleared and a new bitmap started. (Or, perhaps, a data-intensive mode where we compare the current data mode with the most recent incremental backup to re-determine what data has changed. This would be very, very slow but an option at least for recovery if started a new full backup is even less desirable.) Other ideas involve regularly flushing the bitmap at certain timed intervals, certain usage intervals (e.g. when the changed bitmap data reaches some total size, like 64KiB of changed bits), or a combination of regular intervals with "opportunistic" flushing during Disk IO lulls. This is a key feature that absolutely needs to make it into the base series, IMO. (10) Storage Efficiency We should discuss the usage of meta bitmaps or ancillary bitmaps to record which parts of our bitmap data need to be flushed to disk in order to reduce flush/close time. The current meta bitmap implementation optimizes for 1KiB writes to the network (which fits well under the standard 1500bytes), but perhaps we could optimize for local storage block size and use this to be stingy about how much data we decide to write to disk. I believe this is another feature that should be included in the initial series as well, because it might radically impact the core design. (11) Migration Stefan already touched on this, but we should be mindful of the different kinds of migration scenarios. We might migrate the disks, or they might be shared already. We might migrate (or share) a disk, but what happens if we didn't migrate or didn't share the bitmap storage file that we were using? Bitmaps without persistence data will migrate just fine, but how do we intend to migrate the persistence data itself? I suppose as a first pass we can just tap into the migration calls and migrate some properties like= : "This bitmap relies on node_id=3Dxxxx to save its bitmap" and that should probably work for either kind of storage migration tactic. The only problem would be nodes without IDs that we opened by filename ... ...Another technique would be for any bitmap that is persistent is to store them all first prior to migration and then allow the destination to load them anew. This would also work for either shared or migrated storage if we worked it right. It seems a little hairy, and I don't have the answers right now... Something I will ponder on the weekend.