From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:35973)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <vsementsov@odin.com>) id 1ZUu6i-0006bO-0d
	for qemu-devel@nongnu.org; Thu, 27 Aug 2015 06:08:33 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <vsementsov@odin.com>) id 1ZUu6d-0003vC-Ln
	for qemu-devel@nongnu.org; Thu, 27 Aug 2015 06:08:31 -0400
Received: from mx2.parallels.com ([199.115.105.18]:35287)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <vsementsov@odin.com>) id 1ZUu6d-0003v0-EY
	for qemu-devel@nongnu.org; Thu, 27 Aug 2015 06:08:27 -0400
Message-ID: <55DEE18D.5060006@virtuozzo.com>
Date: Thu, 27 Aug 2015 13:08:13 +0300
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
MIME-Version: 1.0
References: <1433776886-27239-1-git-send-email-vsementsov@virtuozzo.com>
	<557B3449.4090301@redhat.com> <55818449.9090801@virtuozzo.com>
	<5589F800.40302@redhat.com>
In-Reply-To: <5589F800.40302@redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: John Snow <jsnow@redhat.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, pbonzini@redhat.com, Fam Zheng <famz@redhat.com>, stefanha@redhat.com, den@openvz.org

On 24.06.2015 03:21, John Snow wrote:
>
> On 06/17/2015 10:29 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 12.06.2015 22:34, John Snow wrote:
>>>
...
>>>
>>> (9) Data consistency
>>>
>>> We need to discuss the data safety element to this. I think that
>>> atomically before the first write is flushed to disk, the dirty bitmap
>>> needs to *at least* set a bit in the bitmap header that indicates that
>>> the bitmap is no longer up-to-date.
>>>
>>> When the bitmap is later flushed to disk, that bit can be cleared until
>>> the next write occurs, which repeats the process.

Not the next write, but next change in the bitmap. Write possibly may 
not change the bitmap (if corresponding bit is already dirty). This is 
the key thing, which can seriously extent life of in_use=0.

>>>
>>> We have discussed this (long ago) in the past, but one of the ideas was
>>> to monitor the relative utilization rate of the disk and attempt to
>>> flush the bitmap whenever there was a lull in disk IO, then clear the
>>> "inconsistent" bit.
>>>
>>> On close, the flush of data and bitmap both would lead us to clear this
>>> bit as well.
>>>
>>> Upon boot, if the inconsistent bit was set, we'd know that the bitmap
>>> was outdated and we'd have to recommend that the bitmap be cleared and a
>>> new bitmap started.
>>>
>>> (Or, perhaps, a data-intensive mode where we compare the current data
>>> mode with the most recent incremental backup to re-determine what data
>>> has changed. This would be very, very slow but an option at least for
>>> recovery if started a new full backup is even less desirable.)
>>>
>>> Other ideas involve regularly flushing the bitmap at certain timed
>>> intervals, certain usage intervals (e.g. when the changed bitmap data
>>> reaches some total size, like 64KiB of changed bits), or a combination
>>> of regular intervals with "opportunistic" flushing during Disk IO lulls.
>>>
>>> This is a key feature that absolutely needs to make it into the base
>>> series, IMO.
>> I don't understand, what the use of flushing bitmap not only on
>> disk:close? If there no failures with disk, than bitmap will be flushed
>> on close and will be consistent for next open(). If there is a disk
>> crash, even if we flush the bitmap regularly, what is the possibility of
>> crashing immediately after last flush, before further io-s?
>>
> The usage case is QEMU crash, power failure, etc. Not disk crash. If we
> periodically flush to HD, we increase the chances that we don't corrupt
> our image and bitmap.
>
> If we NEVER flush, we guarantee that any segfault or power outage will
> absolutely trash our data.

Also, I have the following idea:

Disk is written often.
Bitmap is updated more seldom.
HBitmap previous level is updated even more seldom..

To not store all bitmap levels in file, just save in the image file the 
number of largest consistent level:


flush bitmap: consistent_level = HBITMAP_MAX_LEVEL

change bitmap level X: if consistent_level > X then consistent_level = X 
- 1 (and flush consistent_level to file)

Then, after fail, we can restore the bitmap from last consistent level:

gran = 1 << (level_bits * (HBITMAP_MAX_LEVEL - consistent_level))
bitmap[i] = bitmap[i - i % gran] OR bitmap[i - i % gran + 1] OR ... OR 
bitmap[i - i % gran + (gran - 1)]


to make this scheme independent of HBitmap, it may be better to number 
levels from 0 (0 is largest level), and save level_bits to Image file too.


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.