Re: [PATCH 32/32] kbase: Add document outlining internals of incremental backup in qemu

22 Jun 2020

      On Mon, Jun 22, 2020 at 02:40:18 +0300, Nir Soffer wrote:
...
On Mon, Jun 15, 2020 at 8:13 PM Peter Krempa <pkrempa@redhat.com> wrote:
...
Outline the basics and how to integrate with externally created
overlays. Other topics will continue later.
Thanks, this is very helpful!
...
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
 docs/kbase.html.in                        |   3 +
 docs/kbase/incrementalbackupinternals.rst | 210 ++++++++++++++++++++++
 2 files changed, 213 insertions(+)
 create mode 100644 docs/kbase/incrementalbackupinternals.rst
[...]
...
...
+
+Checking bitmap health
+----------------------
+
+QEMU optimizes disk writes by only updating the bitmaps in certain cases. This
+also can cause problems in cases when e.g. QEMU crashes.
+
+For a chain of bitmaps corresponding in a backing chain to be considered valid
+and eligible for use with ``virDomainBackupBegin`` it must conform to the
+following rules:
+
+1) Top image must contain the bitmap
+2) If any of the backing images in the chain contain the bitmap too all
+   contiguous images must have the bitmap (no gaps)
+3) all of the above bitmaps must be marked as active
+   (``auto`` flag in ``qemu-img`` output, ``recording`` in qemu)
+4) none of the above bitmaps can be inconsistent
+   (``in-use`` flag in ``qemu-img`` provided that it's not used on image which
+   is currently in use by a qemu instance, or ``inconsistent`` in qemu)
Can you add a chapter of about the old format and how it was different
from the new
format?
No, I don't plan to do that. This feature was never enabled upstream
yet, so except for the users who hacked-in the support by adding the
correct capability, nobody was exposed to bitmaps managed by libvirt
yet.

As of such I don't feel it's necessary or even worth documenting the old
state.
...
Looks like the differences are:
- all bitmaps are always active, so no need to enable or disable them
- based on next section, new snapshost contain all the bitmaps from
the previous snapshots
  (since all of them active and we copy all of them to the new snapshot)
Yes.
...
How qemu knows which bitmap should track changes if all bitmaps are active?
When libvirt starts a VM, it knows nothing about the checkpoints. We
define the checkpoints
right before the first backup after starting a VM. So both libvirt and
qemu know nothing about
the bitmaps at this point.
QEMU tracks changes for all active bitmaps, that's why they are active.

Libvirt doesn't care at startup, but cares if you want to do a backup.
...
Do you expect to have the checkpoints defined before the guest is started?
If I understand this correctly, if we do:
- create base image
- do full backup (check-1)
- do incremental backup 1 (check-2)
- create snapshot-1
- do incremental backup 2 (check-3)
- do incremental backup 3 (check-4)
- create snapshot-2
- do incremental backup 4 (check-5)
This will be the image structure:
- base image
    - check-1
    - check-2
- snapshot-1
    - check-1
    - check-2
    - check-3
    - check-4
- snapshot-2
    - check-1
    - check-2
    - check-3
    - check-4
    - check-5
So we are duplicating bitmaps that have no content in all snapshot?
Yes. It's way easier to manage them that way.
...
Why not copy only the last (current) bitmap?
- base image
    - check-1
    - check-2
- snapshot-1
    - check-2
    - check-3
    - check-4
- snapshot-2
    - check-4
    - check-5
This is too complicated to deal with. That's what we did for now and
it's a nightmare to compute what to do.

Ideally when 'block-dirty-bitmap-populate' materializes, I'll add a flag
for API creating snapshots which will stop pulling bitmaps up, so you'll
end up with:

 - base image
     - check-1
     - check-2
 - snapshot-1
     - check-3
     - check-4
 - snapshot-2
     - check-5

Since the rest can be re-calculated It actually doesn't make sense to
use any bitmap after a snapshot.
...
...
+::
+
+ # check that image has bitmaps
+  $ qemu-img info vda-1.qcow2
+   image: vda-1.qcow2
+   file format: qcow2
+   virtual size: 100 MiB (104857600 bytes)
+   disk size: 220 KiB
+   cluster_size: 65536
+   Format specific information:
+       compat: 1.1
+       compression type: zlib
+       lazy refcounts: false
+       bitmaps:
+           [0]:
+               flags:
+                   [0]: in-use
+                   [1]: auto
+               name: chk-a
+               granularity: 65536
+           [1]:
+               flags:
+                   [0]: auto
+               name: chk-b
+               granularity: 65536
+       refcount bits: 16
+       corrupt: false
+
+(See also the ``qemuBlockBitmapChainIsValid`` helper method in
+``src/qemu/qemu_block.c``)
Looks like oVirt needs to implement this a well, otherwise we will waste time
creating bitmaps on a snapshot when libvirt will fail the backup later since
there was an inconsistent or disabled bitmap.
...
+Creating external checkpoints manually
+--------------------------------------
+
+To create the same topology outside of libvirt (e.g when doing snapshots offline)
+a new ``qemu-img`` which supports the ``bitmap`` subcomand is necessary. The
+following algorithm then ensures that the new image after snapshot will work
+with backups (note that ``jq`` is a JSON processor):
+
+::
+
+  # arguments
+  SNAP_IMG="vda-2.qcow2"
+  BACKING_IMG="vda-1.qcow2"
+
+  # constants - snapshots and bitmaps work only with qcow2
+  SNAP_FMT="qcow2"
+  BACKING_IMG_FMT="qcow2"
+
+  # create snapshot overlay
+  qemu-img create -f "$SNAP_FMT" -F "$BACKING_IMG_FMT" -b "$BACKING_IMG" "$SNAP_IMG"
+
+  BACKING_IMG_INFO=$(qemu-img info --output=json -f "$BACKING_IMG_FMT" "$BACKING_IMG")
+  BACKING_BITMAPS=$(jq '."format-specific".data.bitmaps' <<< "$BACKING_IMG_INFO")
+
+  if [ "x$BACKING_BITMAPS" == "xnull" ]; then
+      exit 0
+  fi
+
+  for BACKING_BITMAP_ in $(jq -c '.[]' <<< "$BACKING_BITMAPS"); do
+      BITMAP_FLAGS=$(jq -c -r '.flags[]' <<< "$BACKING_BITMAP_")
+      BITMAP_NAME=$(jq -r '.name' <<< "$BACKING_BITMAP_")
+
+      if grep 'in-use' <<< "$BITMAP_FLAGS" ||
+         grep -v 'auto' <<< "$BITMAP_FLAGS"; then
+         continue
+      fi
+
+      qemu-img bitmap -f "$SNAP_FMT" "$SNAP_IMG" --add "$BITMAP_NAME"
So what we have to do is:
- get a list of bitmaps that should be in this disk from oVirt engine
- get list of bitmaps with "auto" flag and without the "in-use" flag
in the backing file
- if the lists do not match we can delete all bitmaps and relevant
checkpoints on engine
  side, since the next backup will fail anyway.
- if the lists match, maybe verify that bitmaps are not missing in
lower layers (gaps)
- if the lists match, create empty bitmap with the same name and
granularity in the top image
What do we have to do for the old format? or we just not implement
this until we get the new
format?
The above algorithm will work even for the old format, by the way.
There's just one active bitmap at any point. The validation algorithm
will not work obviously.

The problem of the "old" format is not in snapshots, but rather in any
other operation.
...
Can you add a section explaining how bitmaps should be handled in blockCommit?
We may need to implement this for cold merge using qemu-img commit.
Yes, right after this get's merged. By the way, block commit is one of
the things where the 'old' format was becoming insane.
...
Looking at the current structure, it looks like we have to do:
1. commit top layer to base layer (we support only one layer commit)
2. merge all bitmaps from top layer to base layer
3. copy all bitmaps in top layer that are not in base layer to base layer
Yes, for that special case it's true. Technically I'd write it as:

2) create bitmaps in base layer which are in top but not in base (they
are empty now
3) merge all bitmaps from top into corresponding bitmap in base