On 04/23/2018 06:38 AM, Nikolay Shirokovskiy wrote:
On 21.04.2018 00:26, Eric Blake wrote:
> On 04/20/2018 01:24 PM, John Snow wrote:
>
>>>> Why is option 3 unworkable, exactly?:
>>>>
>>>> (3) Checkpoints exist as structures only with libvirt. They are saved
>>>> and remembered in the XML entirely.
>>>>
>>>> Or put another way:
>>>>
>>>> Can you explain to me why it's important for libvirt to be able to
>>>> reconstruct checkpoint information from a qcow2 file?
>>>>
>>>
>>> In short it take extra effort for metadata to be consistent when
>>> libvirtd crashes occurs. See for more detailed explanation
>>> in [1] starting from words "Yes it is possible".
>>>
>>> [1]
https://www.redhat.com/archives/libvir-list/2018-April/msg01001.html
>
> I'd argue the converse. Libvirt already knows how to do atomic updates
> of XML files that it tracks. If libvirtd crashes/restarts in the middle
> of an API call, you already have indeterminate results of whether the
> API worked or failed; once libvirtd is restarted, you'll have to
> probably retry the command. For all other cases, the API call
> completes, and either no XML changes were made (the command failed and
> reports the failure properly), or all XML changes were made (the command
> created the appropriate changes to track the new checkpoint, including
> whatever bitmap names have to be recorded to map the relation between
> checkpoints and bitmaps).
We can fail to save XML... Consider we have B1, B2 and create B3 bitmap
in the process of creating checkpoint C3. Next qemu creates snapshot
and bitmap successfully then libvirt fail to update XML and after some
time libvirt restarts (not even crashes). Now libvirt nows of B1 and B2 but not B3.
What can be the consequences? For example if we ask bitmap from C2 we
miss all changes from C3 as we don't know of B3. This will lead to corrupted
backups.
This can be fixed:
- in qemu. If bitmaps have child/parent realtionship then on libvirt restart
we can recover (we ask qemu for bitmaps, discover B3 and then discover
B3 is child of B2). This is how basically implementation with naming
scheme works. Well on this way we don't need special metadata in
libvirt (besides may be domain xml attached to checkpoiint etc)
- in libvirt. If we save XML before creating a snapshot with checkpoint.
This fixes the issue with successful operation but saving XML failure.
But now we have another issue :) We can save XML successfully but then operation
itself can fail and we fail to revert XML back. Well we can recover
even without child/parent metadata in qemu in this case. Just ask
qemu for bitmaps on libvirt restart and if bitmap is missing kick
it out as it is a case described above (successful saving XML then
unsuccessfull qemu operation)
This option seems perfectly workable to me...
So it is possible to track bitmaps in libvirt. We just need to be
extra carefull
not to produce invalid backups.
>
> Consider the case of internal snapshots. Already, we have the case
> where qemu itself does not track enough useful metadata about internal
> snapshots (right now, just a name and timestamp of creation); so libvirt
> additionally tracks further information in <domainsnapshot>: the name,
> timestamp, relationship to any previous snapshot (libvirt can then
> reconstruct a tree relationship between all snapshots; where a parent
> can have more than one child if you roll back to a snapshot and then
> execute the guest differently), the set of disks participating in the
> snapshot, and the <domain> description at the time of the snapshot (if
> you hotplug devices, or even the fact that creating external snapshots
> changes which file is the active qcow2 in a backing chain, you'll need
> to know how to roll back to the prior domain state as part of
> reverting). This is approximately the same set of information that a
> <domaincheckpoint> will need to track.
I would differentiate checkpoints and backups. For example in case
of push backups we can store additional metadata in <domainbackup>
so later we can revert back to previous state. But checkpoints
(bitmaps technically) are only to make incremental backups(restores?).
We can attach extra metadata to checkpoints but it looks accidental just because
bitmaps and backups relate to some same point in time. To me a backup (push)
can carry all the metadata and as to checkpoints a backup can have
associated checkpoint or not. For example if we choose to always
make full backups we don't need checkpoints at all (at least if we are
not going to use them for restore).
Well ... if we create checkpoints alongside full backups, then you have
points to reference to create future incremental backups. You don't need
checkpoints if you *NEVER* use an incremental backup. If we want the
feature enabled, so to speak, you likely need to be making checkpoints
alongside full backups.
I'd say the cases in which we don't want them -- once the feature is
enabled -- are hard to find.
>
> I'm slightly tempted to just overload <domainsnapshot> to track three
> modes instead of two (internal, external, and now checkpoint); but think
> that will probably be a bit too confusing, so more likely I will create
> <domaincheckpoint> as a new object, but copy a lot of coding paradigms
> from <domainsnapshot>.
I wonder if you are going to use tree or list structure for backups.
To me it is much easier to think of backups just as sequence of states
in time. For example consider Grandfather-Father-Son scheme of Acronis backups [1].
Typical backup can look like:
F - I - I - I - I - D - I - I - I - I - D
Where F is full monthly backup, I incremental daily backup and D is
diferrential weekly backup (no backups on Sunday and Saturday).
This is representation from time POV. From backup dependencies POV it look likes next:
F - I - I - I - I D - I - I - I - I D
\-------------------| |
\--------------------------------------|
or more common representation:
F - I - I - I - I
\- D - I - I - I - I
\- D - I - I - I - I
To me using tree structure in snapshots is aproppriate because each branching
point is some semantic state ("basic OS installed") and branches are different
trials from that point. In backup case I guess we don't want branching on recovery
to some backup, we just want to keep selected backup scheme going. So for example
if we recover on Wednesday to previous week's Friday then later on Wednesday we
will have regular Wednesday backup as if we have not been recovered. This makes
things simple for client or he will drawn in dependencies (especially after
a couple of recoverings).
But your representation is itself a tree -- is this a good argument
against hierarchical information ... ?
If you don't utilize the hierarchy, the degenerate form is indeed just a
list:
F - I - I - I - I - I - I - I - I - I ...
everything has just one successor.
I think Eric just feels he can get good code re-use out of the
<domainsnapshot> element -- since each <snapshot> element itself
references a parent ID; there's no real "cost" to tracking a tree
instead of a list.
There's nothing stopping you from adding three checkpoints that have the
same parent, so to speak.
I think this is just something that might wind up happening "for free"
due to the nature of how libvirt stores relational data at all.
Of course internally we need to track backup dependencies in order
to
properly delete backups or recover from them.
[1]
https://www.acronis.com/en-us/support/documentation/AcronisBackup_11.5/in...
>
> So, from that point of view, libvirt tracking the relationship between
> qcow2 bitmaps in order to form checkpoint information can be done ALL
> with libvirt, and without NEEDING the qcow2 file to track any relations
> between bitmaps. BUT, libvirt's job can probably be made easier if
> qcow2 would, at the least, allow bitmaps to track their parent, and/or
> provide APIs to easily merge a parent..intermediate..child chain of
> related bitmaps to be merged into a single bitmap, for easy runtime
> creation of the temporary bitmap used to express the delta between two
> checkpoints.
>
>
[snip]
Nikolay