On 06/26/2018 10:56 AM, Nir Soffer wrote:
On Wed, Jun 13, 2018 at 7:42 PM Eric Blake <eblake(a)redhat.com>
wrote:
> Upcoming patches plan to introduce virDomainCheckpointPtr as a new
> object for use in incremental backups, along with documentation
> how incremental backups differ from snapshots. But first, we need
> to rename any existing mention of a 'system checkpoint' to instead
> be a 'full system state snapshot', so that we aren't overloading
> the term checkpoint.
>
I want to refer only to the new concept of checkpoint, compared with
snapshot.
I think checkpoint should refer to the current snapshot. When you perform
a backup, you should get the changed blocks in the current snapshot.
That is an incremental backup (copying only the blocks that have changed
since some previous point in time) - and my design was that such points
in time are named 'checkpoints', where the most recent checkpoint is the
current checkpoint. This is different from a snapshot (which is enough
state that you can revert back to that point in time directly) - a
checkpoint only carries enough information to perform an incremental
backup rather than a rollback to earlier state.
When you restore, you want to the restore several complete
snapshots,
and one partial snapshot, based on the backups of that snapshot.
I'm worried that overloading the term "snapshot" and/or
"checkpoint" can
make it difficult to see whether we are describing the same data motions.
You are correct that in order to roll a virtual machine back to state
represented by a series of incremental backups will require
reconstructing the state present on the machine at the desired point to
roll back to. But I'll have to read your example first to see if we're
on the same page.
Lets try to see an example:
T1
- user create new vm marked for incremental backup
- system create base volume (S1)
- system create new dirty bitmap (B1)
Why do you need a dirty bitmap on a brand new system? By definition, if
the VM is brand new, every sector that the guest touches will be part of
the first incremental backup, which is no different than taking a full
backup of every sector? But if it makes life easier by following
consistent patterns, I also don't see a problem with creating a first
checkpoint at the time an image is first created (my API proposal would
allow you to create a domain, start it in the paused state, create a
checkpoint, and then resume the guest so that it can start executing).
T2
- user create a snapshot
- dirty bitmap in original snapshot deactivated (B1)
- system create new snapshot (S2)
- system starts new dirty bitmap in the new snapshot (B2)
I'm still worried that interactions between snapshots (where the backing
chain grows) and bitmaps may present interesting challenges. But what
you are describing here is that the act of creating a snapshot (to
enlarge the backing chain) also has the effect of creating a snapshot (a
new point in time for tracking incremental changes since the creation of
the snapshot). Whether we have to copy B1 into image S2, or whether
image S2 can get by with just bitmap B2, is an implementation detail.
T3
- user create new checkpoint
- system deactivate current dirty bitmap (B2)
- system create new dirty bitmap (B3)
- user backups data in snapshot S2 using dirty bitmap B2
- user backups data in snapshot S1 using dirty bitmap B1
So here you are performing two incremental backups. Note: the user can
already backup S1 without using any new APIs, and without reference to
bitmap B1 - that's because B1 was started when S1 was created, and
closed out when S1 was no longer modified - but now that S1 is a
read-only file in the backing chain, copying S1 is the same as copying
the clusters covered by bitmap B1.
Also, my current API additions do NOT make it easy to grab just the
incremental data covered by bitmap B1 at time T3; rather, the time to
grab the copy of the data covered just by B1 is at time T2 when you
create bitmap B2 (whether or not you also create file S2). The API
additions as I have proposed them only make it easy to grab a full
backup of all data up to time T3 (no checkpoint as its start), an
incremental backup of all data since T1 (checkpoint T1 as its start,
using the merge of B1 and B2 to learn which clusters to grab), or an
incremental backup of all data since T2 (checkpoint T2 as its start,
using B2 to learn which clusters to grab).
If you NEED to grab an incremental snapshot whose history is NOT bounded
by the current moment in time, then we need to rethink the operations we
are offering via my new API. On the bright side, since my API for
virDomainBackupBegin() takes an XML description, we DO have the option
of enhancing that XML to take a second point in time as the end boundary
(it already has an optional <incremental> tag as the first point in time
for the start boundary; or a full backup if that tag is omitted) - if we
enhance that XML, we'd also have to figure out how to map it to the
operations that qemu exposes. (The blockdev-backup command makes it
easy to grab an incremental backup ending at the current moment in time,
by using the "sync":"none" option to a temporary scratch file so that
further guest writes do not corrupt the data to be grabbed from that
point in time - but it does NOT make it easy to see the state of data
from an earlier point in time - I'll demonstrate that below).
T4
- user create new checkpoint
- system deactivate current dirty bitmap (B3)
- system create new dirty bitmap (B4)
- user backups data in snapshot S2 using dirty bitmap B3
Yes, this is similar to what was done at T3, without the complication of
trying to grab an incremental backup whose end boundary is not the
current moment in time.
Lets say use want to restore to state as it was in T3
This is the data kept by the backup application:
- snapshots
- S1
- checkpoints
- B1
- S2
- checkpoints
- B2
- B3
T5
- user start restore to state in time T3
- user create new disk
- user create empty snapshot S1
- user upload snapshot S1 data to stoage
- user create empty snaphost disk S2
- user upload snapshot S1 data to stoage
Presumably, this would be 'user uploads S2 to storage', not S1. But
restoring in this manner didn't make any use of your incremental snapshots.
Maybe what I need to do is give a more visual indication of what
incremental backups store.
At T1, we create S1 and start populating it. As this was a brand new
guest, the storage starts empty. Since you mentioned B1, I'll show it
here, even though I argued it is pointless other than for fewer
differences from later cases:
S1: |--------|
B1: |--------|
guest sees: |--------|
At T2, the guest has written things, so we now have:
S1: |AAAA----|
B1: |XXXX----|
guest sees: |AAAA----|
where A is the contents of the data the guest has written, and X is an
indication in the bitmap which sections are dirty.
Also at time T2, we create a snapshot S2, making S1 become a read-only
picture of the state of the disk at T2; we also started bitmap B2 on S2
to track what the guest does:
S1: |AAAA----| <- S2: |--------|
B1: |XXXX----| B2: |--------|
we can copy S1 to S1.bak at any point in time now that S1 is readonly.
S1.bak: |AAAA----|
At T3, the guest has written things, so we now have:
S1: |AAAA----| <- S2: |---BBB--|
B1: |XXXX----| B2: |---XXX--|
guest sees: |AAABBB--|
so at this point, we freeze B2 and create B3; the new
virDomainBackupBegin() API will let us also access the following copies
at this time:
S1: |AAAA----| <- S2: |---BBB--|
B1: |XXXX----| B2: |---XXX--|
B3: |--------|
full3.bak (no checkpoint as starting point): |AAABBB--|
B2.bak (checkpoint B2 as starting point): |---BBB--|
B2.bak by itself does not match anything the guest ever saw, but you can
string together:
S1.bak <- S2.bak
to reconstruct the state the guest saw at T3. By T4, the guest has made
more edits:
S1: |AAAA----| <- S2: |D--BBDD-|
B1: |XXXX----| B2: |---XXX--|
B3: |X----XX-|
guest sees: |DAABBDD-|
and as before, we now create B4, and have the option of several backups
(usually, you'll only grab the most recent incremental backup, and not
multiple backups; this is more an exploration of what is possible):
full4.bak (no checkpoint as starting): |DAABBDD-|
S2_3.bak (B2 as starting point, covering merge of B2 and B3): |D--BBDD-|
S3.bak (B3 as starting point): |D----DD-|
Note that both (S1.bak <- S2_3.bak) and (S1.bak <- S2.bak <- S3.bak)
result in the same reconstructed guest image at time T4. Also note that
reading the contents of bitmap B2 in isolation at this time is NOT
usable (you'd get |---BBD--|, which has mixed the incremental difference
from T2 to T3 with a subset of the difference from T3 to T4, so it NO
LONGER REPRESENTS the state of the guest at either T2 or T3, even when
used as an overlay on top of S1.bak). Hence, my emphasis that it is
usually important to create your incremental backup at the same time you
start your next bitmap, rather than trying to do it after the fact.
Also, you are starting to see the benefits of incremental backups.
Creating S2_3.bak doesn't necessarily need bitmaps (it results in the
same image as you would get if you create a temporary overlay [S1 <- S2
<- tmp], copy off S2, then live merge tmp back into S2), but both
full4.bak and S2_3.bak had to copy more data than S3.bak.
Later on, if you want to roll back to what the guest saw at T4, you just
have to restore [S1.bak <- S2.bak <- S3.bak] as your backing chain to
provide the data the guest saw at that time.
John, are dirty bitmaps implemented in this way in qemu?
The whole point of the libvirt API proposals is to make it possible to
create bitmaps in qcow2 images at the point where you are creating
incremental backups, so that the next incremental backup can be created
using the previous one as its base.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org