13.04.2018 23:02, John Snow wrote:
On 04/12/2018 10:08 AM, Vladimir Sementsov-Ogievskiy wrote:
> It's not easier, as we'll have to implement either separate of bitmaps
> concept of checkpoints, which will be based on bitmaps, and we'll have
> to negotiate and implement storing these objects to qcow2 and migrate
> them. Or we'll go through proposed by Kevin (If I remember correctly)
> way of adding something like "backing" or "parent" pointer to
> BdrvDirtyBitmap, and anyway store to qcow2, migrate and expose qapi for
> them. The other (more hard way) is move to multi-bit bitmaps (like in
> vmware), where for each granularity-chunk we store a number,
> representing "before which checkpoint was the latest change of this
> chunk", and the same, qapi+qcow2+migration.
>
> It's all not easier than call several simple qmp commands.
OK, I just wanted to explore the option before we settled on using the
name as metadata.
What are the downsides to actually including a predecessor/successor*
pointer in QEMU?
the problem is the following: we want checkpoints, and it is bad to
implement them through additional mechanism, which is definitely not
"checkpoints".
With checkpoints in qemu:
- for user checkpoints are unrelated to dirty bitmaps, they are
managed separately, it's safe
- clean api realisation in libvirt, to remove a checkpoint libvirt
will call qmp block-checkpoint-remove
With additional pointer in BdrvDirtyBitmap:
- checkpoint-related bitmaps share same namespace with other bitmaps
- user still can remove bitmap from the chain, without corresponding
merge, which breaks the whole thing
- we'll need to implement api like for block layer : bitmap-commit,
bitmap-pull, etc.. (or just leave my merge), but it's all not what we
want, not checkpoints.
So my point is: if we are going to implement something complicated,
let's implement entirely what we want, not a semi-solution. Otherwise,
implement a minimal and simple thing, to just make it all work (my
current solution).
So, if you agree with me, that true way is checkpoints api for qemu,
there things, which we need to implement:
1. multi-bit dirty bitmaps (if I remember correctly, such thing is done
in vmware), that is:
For each data chunk we have not one bit, but several, and store a
number, which refer to last checkpoint, after which there were changes
in this area. So, if we want to get blocks, changed from snapshot N up
to current time, we just take all blocks, for which this number is >= N.
It is more memory-efficient, then storing several dirty bitmaps.
On the other hand, several linked dirty bitmaps have the other
advantage: we can store in RAM only the last, the active one, and other
on disk. And load them only on demand, when we need to merge. So, it
looks like true way is combination of multi- and one- bit dirty bitmaps,
with ability to load/store them to disk dinamically. So we need
2. Some link(s) in BdrvDirtyBitmap, to implement relations. May be, it's
better to store two links to checkpoints, for which the bitmap defines
the difference. (or to several checkpoints, if it is a multi-bit bitmap)
3. Checkpoints objects, with separate management, backed by dirty
bitmaps (one- or multi-). These bitmaps should not be directly
accessible by user, but user should have a possibility to setup a
strategy (one- or multi- bit, or their combinations, keep all in RAM, or
keep inactive on disk and active in RAM, etc).
4. All these things should be stored to qcow2, all should successfully
migrate, and we also need to thing about NBD exporting (however, looks
like NBD protocol is flexible enough to do it)
===
Also, we need to understand, what are user cases for this all.
case 1. Incremental restore to some point in past: If we know, which
blocks are modified since this point, we can copy only these blocks from
backup. But, it's obvious that this information can be extracted from
backup itself (we should know, which blocks was actually backed up). So,
I'm not sure that this all worth doing.
case 2. several inc-backup chains to different backup storages with
different timesheets. Actually, we support it by just several active
dirty bitmaps. But it looks inefficient:
What is the reason to maintain several active dirty bitmaps, which are
used seldom? They eat RAM and CPU time on each write. It looks better to
have only one active bitmap, and several disabled, which we can store in
the disk, not in RAM. And this leads us to checkpoints.. Checkpoints are
more natural for users to make backups, then dirty bitmaps. And
checkpoints give a way to improve ram and cpu usage.
As a first step the following may be done:
Add two string fields to BdrvDirtyBitmap:
checkpoint-from
checkpoint-to
Which defines checkpoint names. For such bitmaps name field should be
zero. Add these fields to qcow2 bitmap representation and to migration
protocol. Add checkpoint api (create/remove/nbd export). Deprecate
bitmap api (move to checkpoints for drive- and blockdev- backup
commands). We can add "parent" or something like this pointer to
BdrvDirtyBitmap, but it should be only implementation detail, not
user-seen thing.
(1) We'd need to amend the bitmap persistence format
(2) We'd need to amend some of the bitmap management commands
(3) We'd need to make sure it migrates correctly:
(A) Shared storage should be fine; just flush to disk and pivot
(B) Live storage needs to learn a new field to migrate.
Certainly it's not ...trivial, but not terribly difficult either. I
wonder if it's the right thing to do in lieu of the naming hacks in libvirt.
There wasn't really a chorus of applause for the idea of having
checkpoints more officially implemented in QEMU, but... abusing the name
metadata still makes me feel like we're doing something wrong --
especially if a third party utility that doesn't understand the concept
of your naming scheme comes along and modifies a bitmap.
It feels tenuous and likely to break, so I'd like to formalize it more.
We can move this discussion over to the QEMU lists if you think it's
worth talking about.
Or I'll just roll with it. I'll see what Eric thinks, I guess? :)
*(Uh-oh, that term is overloaded for QEMU bitmap internals... we can
address that later...)
--
Best regards,
Vladimir