On 23.04.2018 16:50, Eric Blake wrote:
On 04/23/2018 05:38 AM, Nikolay Shirokovskiy wrote:
>> I'd argue the converse. Libvirt already knows how to do atomic updates
>> of XML files that it tracks. If libvirtd crashes/restarts in the middle
>> of an API call, you already have indeterminate results of whether the
>> API worked or failed; once libvirtd is restarted, you'll have to
>> probably retry the command. For all other cases, the API call
>> completes, and either no XML changes were made (the command failed and
>> reports the failure properly), or all XML changes were made (the command
>> created the appropriate changes to track the new checkpoint, including
>> whatever bitmap names have to be recorded to map the relation between
>> checkpoints and bitmaps).
>
> We can fail to save XML... Consider we have B1, B2 and create B3 bitmap
> in the process of creating checkpoint C3. Next qemu creates snapshot
> and bitmap successfully then libvirt fail to update XML and after some
> time libvirt restarts (not even crashes). Now libvirt nows of B1 and B2 but not B3.
Libvirt is in charge of tracking ALL state internally that it requires
to restore state properly across a libvirtd restart, so that it presents
the illusion of a libvirt API atomically completing or failing. If
libvirt creates bitmap B3 but does not create checkpoint C3 prior to it
restarting, then on restart, it should be able to correctly see that B3
is stranded and delete it (rather, merge it back into B2 so that B2
remains the only live bitmap) as part of an incomplete API that failed.
> What can be the consequences? For example if we ask bitmap from C2 we
> miss all changes from C3 as we don't know of B3. This will lead to corrupted
> backups.
Checkpoint C3 does not exist if libvirt API did not complete correctly
(even if bitmap B3 exists). It should merely be a matter of libvirt
making proper annotations of what it plans to do prior to calling into
qemu, so that if it restarts, it can recover from an intermediate state
of failure to follow those plans.
>
> This can be fixed:
>
> - in qemu. If bitmaps have child/parent realtionship then on libvirt restart
> we can recover (we ask qemu for bitmaps, discover B3 and then discover
> B3 is child of B2). This is how basically implementation with naming
> scheme works. Well on this way we don't need special metadata in
> libvirt (besides may be domain xml attached to checkpoiint etc)
>
> - in libvirt. If we save XML before creating a snapshot with checkpoint.
> This fixes the issue with successful operation but saving XML failure.
> But now we have another issue :) We can save XML successfully but then operation
> itself can fail and we fail to revert XML back. Well we can recover
> even without child/parent metadata in qemu in this case. Just ask
> qemu for bitmaps on libvirt restart and if bitmap is missing kick
> it out as it is a case described above (successful saving XML then
> unsuccessfull qemu operation)
>
> So it is possible to track bitmaps in libvirt. We just need to be extra carefull
> not to produce invalid backups.
Yes, but that's true of any interface where a single libvirt API
controls multiple steps in qemu.
>
>>
>> Consider the case of internal snapshots. Already, we have the case
>> where qemu itself does not track enough useful metadata about internal
>> snapshots (right now, just a name and timestamp of creation); so libvirt
>> additionally tracks further information in <domainsnapshot>: the name,
>> timestamp, relationship to any previous snapshot (libvirt can then
>> reconstruct a tree relationship between all snapshots; where a parent
>> can have more than one child if you roll back to a snapshot and then
>> execute the guest differently), the set of disks participating in the
>> snapshot, and the <domain> description at the time of the snapshot (if
>> you hotplug devices, or even the fact that creating external snapshots
>> changes which file is the active qcow2 in a backing chain, you'll need
>> to know how to roll back to the prior domain state as part of
>> reverting). This is approximately the same set of information that a
>> <domaincheckpoint> will need to track.
>
> I would differentiate checkpoints and backups. For example in case
> of push backups we can store additional metadata in <domainbackup>
> so later we can revert back to previous state. But checkpoints
> (bitmaps technically) are only to make incremental backups(restores?).
> We can attach extra metadata to checkpoints but it looks accidental just because
> bitmaps and backups relate to some same point in time. To me a backup (push)
> can carry all the metadata and as to checkpoints a backup can have
> associated checkpoint or not. For example if we choose to always
> make full backups we don't need checkpoints at all (at least if we are
> not going to use them for restore).
It's still nice to track the state of the <domain> XML at the time of
the backup, even if you aren't using checkpoints.
>
>>
>> I'm slightly tempted to just overload <domainsnapshot> to track three
>> modes instead of two (internal, external, and now checkpoint); but think
>> that will probably be a bit too confusing, so more likely I will create
>> <domaincheckpoint> as a new object, but copy a lot of coding paradigms
>> from <domainsnapshot>.
>
> I wonder if you are going to use tree or list structure for backups.
> To me it is much easier to think of backups just as sequence of states
> in time. For example consider Grandfather-Father-Son scheme of Acronis backups [1].
> Typical backup can look like:
>
> F - I - I - I - I - D - I - I - I - I - D
Libvirt already tracks snapshots as a tree rather than a list; so I see
no reason why checkpoints should be any different. You don't branch in
the tree unless you revert to an earlier point (so the tree is often
linear in practice), but just because branching isn't common doesn't
mean it can't happen.
My point is trees are not that useful for backups (and as a result
for checkpoints). Let's suppose we use tree structure for backups and
tree is based on backing file relationship.
First as qouted below from my previous letter we will have trees even without restore
because for example we can have differential backups in backup schedule.
Second if after an *incremental* restore you make backup based on the state you
restored from and not the state dictated by backup schedule
(previous day state for example) you get advantage in disk space
I guess but there are also disadvantages:
- you need to keep the old state you restored from for a longer time while
backup retention policies usually tend to keep only recent states
- backup need to be aware of restore so that if you restored in the
morning and backup is scheduled at evening you know that backup
should be based on the state you restored from
- backup schedule is disrupted (wednesday backup should be incremental
but due to restore it becomes differential for example)
Another argument is that checkpoint for the state you restored to
can be missing and you will need to make *full* restore so there is
no point to have the state restored you restored from as parent as you rewrite
whole disk.
In short I belive it is much simple to think of restore as process
unrelated to backups. So after restore you make your regular scheduled
backup just as disk was changed by guest. So the cause of disk changes
does not matter and backup schedule countinue to create VM backups
day by day.
Nikolay
>
> Where F is full monthly backup, I incremental daily backup and D is
> diferrential weekly backup (no backups on Sunday and Saturday).
> This is representation from time POV. From backup dependencies POV it look likes
next:
>
> F - I - I - I - I D - I - I - I - I D
> \-------------------| |
> \--------------------------------------|
>
> or more common representation:
>
> F - I - I - I - I
> \- D - I - I - I - I
> \- D - I - I - I - I
>
> To me using tree structure in snapshots is aproppriate because each branching
> point is some semantic state ("basic OS installed") and branches are
different
> trials from that point. In backup case I guess we don't want branching on
recovery
> to some backup, we just want to keep selected backup scheme going. So for example
> if we recover on Wednesday to previous week's Friday then later on Wednesday we
> will have regular Wednesday backup as if we have not been recovered. This makes
> things simple for client or he will drawn in dependencies (especially after
> a couple of recoverings).
>
> Of course internally we need to track backup dependencies in order to
> properly delete backups or recover from them.
>
> [1]
https://www.acronis.com/en-us/support/documentation/AcronisBackup_11.5/in...
>>
>> So, from that point of view, libvirt tracking the relationship between
>> qcow2 bitmaps in order to form checkpoint information can be done ALL
>> with libvirt, and without NEEDING the qcow2 file to track any relations
>> between bitmaps. BUT, libvirt's job can probably be made easier if
>> qcow2 would, at the least, allow bitmaps to track their parent, and/or
>> provide APIs to easily merge a parent..intermediate..child chain of
>> related bitmaps to be merged into a single bitmap, for easy runtime
>> creation of the temporary bitmap used to express the delta between two
>> checkpoints.
>>
>>
>
> [snip]
>
> Nikolay
>