On Wed, Jun 13, 2018 at 7:42 PM Eric Blake <eblake@redhat.com> wrote:
Prepare for new checkpoint and backup APIs by describing the XML
that will represent a checkpoint.  This is modeled heavily after
the XML for virDomainSnapshotPtr, since both represent a point in
time of the guest.  But while a snapshot exists with the intent
of rolling back to that state, a checkpoint instead makes it
possible to create an incremental backup at a later time.

Add testsuite coverage of a minimal use of the XML.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 docs/docs.html.in                          |   3 +-
 docs/domainstatecapture.html.in            |   4 +-
 docs/formatcheckpoint.html.in              | 273 +++++++++++++++++++++++++++++
 docs/schemas/domaincheckpoint.rng          |  89 ++++++++++
 libvirt.spec.in                            |   1 +
 mingw-libvirt.spec.in                      |   2 +
 tests/domaincheckpointxml2xmlin/empty.xml  |   1 +
 tests/domaincheckpointxml2xmlout/empty.xml |  10 ++
 tests/virschematest.c                      |   2 +
 9 files changed, 382 insertions(+), 3 deletions(-)
 create mode 100644 docs/formatcheckpoint.html.in
 create mode 100644 docs/schemas/domaincheckpoint.rng
 create mode 100644 tests/domaincheckpointxml2xmlin/empty.xml
 create mode 100644 tests/domaincheckpointxml2xmlout/empty.xml

diff --git a/docs/docs.html.in b/docs/docs.html.in
index 4c46b74980..11dfd27ba6 100644
--- a/docs/docs.html.in
+++ b/docs/docs.html.in
@@ -79,7 +79,8 @@
           <a href="formatdomaincaps.html">domain capabilities</a>,
           <a href="formatnode.html">node devices</a>,
           <a href="formatsecret.html">secrets</a>,
-          <a href="formatsnapshot.html">snapshots</a></dd>
+          <a href="formatsnapshot.html">snapshots</a>,
+          <a href="formatcheckpoint.html">checkpoints</a></dd>

         <dt><a href="uri.html">URI format</a></dt>
         <dd>The URI formats used for connecting to libvirt</dd>
diff --git a/docs/domainstatecapture.html.in b/docs/domainstatecapture.html.in
index 00ab7e8ee1..4de93c87c8 100644
--- a/docs/domainstatecapture.html.in
+++ b/docs/domainstatecapture.html.in
@@ -154,9 +154,9 @@
         time as a new backup, so that the next incremental backup can
         refer to the incremental state since the checkpoint created
         during the current backup.  Guest state is then actually
-        captured using <code>virDomainBackupBegin()</code>.  <!--See also
+        captured using <code>virDomainBackupBegin()</code>.  See also
         the <a href="formatcheckpoint.html">XML details</a> used with
-        this command.--></dd>
+        this command.</dd>
     </dl>

     <h2><a id="examples">Examples</a></h2>
diff --git a/docs/formatcheckpoint.html.in b/docs/formatcheckpoint.html.in
new file mode 100644
index 0000000000..34507a9f68
--- /dev/null
+++ b/docs/formatcheckpoint.html.in
@@ -0,0 +1,273 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <body>
+    <h1>Checkpoint and Backup XML format</h1>
+
+    <ul id="toc"></ul>
+
+    <h2><a id="CheckpointAttributes">Checkpoint XML</a></h2>

id=CheckpointXML?
 
+
+    <p>
+      Domain disk backups, including incremental backups, are one form
+      of <a href="domainstatecapture.html">domain state capture</a>.
+    </p>
+    <p>
+      Libvirt is able to facilitate incremental backups by tracking
+      disk checkpoints, or points in time against which it is easy to
+      compute which portion of the disk has changed.  Given a full
+      backup (a backup created from the creation of the disk to a
+      given point in time, coupled with the creation of a disk
+      checkpoint at that time),

Not clear.
 
and an incremental backup (a backup
+      created from just the dirty portion of the disk between the
+      first checkpoint and the second backup operation),

Also not clear.
 
it is
+      possible to do an offline reconstruction of the state of the
+      disk at the time of the second backup, without having to copy as
+      much data as a second full backup would require.  Most disk
+      checkpoints are created in concert with a backup,
+      via <code>virDomainBackupBegin()</code>; however, libvirt also
+      exposes enough support to create disk checkpoints independently
+      from a backup operation,
+      via <code>virDomainCheckpointCreateXML()</code>.

Thanks for the extra context.
 
+    </p>
+    <p>
+      Attributes of libvirt checkpoints are stored as child elements of
+      the <code>domaincheckpoint</code> element.  At checkpoint creation
+      time, normally only the <code>name</code>, <code>description</code>,
+      and <code>disks</code> elements are settable; the rest of the
+      fields are ignored on creation, and will be filled in by
+      libvirt in for informational purposes

So the user is responsible for creating checkpoints names? Do we use these
the same name in qcow2?
 
+      by <code>virDomainCheckpointGetXMLDesc()</code>.  However, when
+      redefining a checkpoint,
+      with the <code>VIR_DOMAIN_CHECKPOINT_CREATE_REDEFINE</code> flag
+      of <code>virDomainCheckpointCreateXML()</code>, all of the XML
+      described here is relevant.
+    </p>
+    <p>
+      Checkpoints are maintained in a hierarchy.  A domain can have a
+      current checkpoint, which is the most recent checkpoint compared to
+      the current state of the domain (although a domain might have
+      checkpoints without a current checkpoint, if checkpoints have been
+      deleted in the meantime).  Creating or reverting to a checkpoint
+      sets that checkpoint as current, and the prior current checkpoint is
+      the parent of the new checkpoint.  Branches in the hierarchy can
+      be formed by reverting to a checkpoint with a child, then creating
+      another checkpoint.

This seems too complex. Why do we need arbitrary trees of checkpoints?

What is the meaning of reverting a checkpoint?
 
+    </p>
+    <p>
+      The top-level <code>domaincheckpoint</code> element may contain
+      the following elements:
+    </p>
+    <dl>
+      <dt><code>name</code></dt>
+      <dd>The name for this checkpoint.  If the name is specified when
+        initially creating the checkpoint, then the checkpoint will have
+        that particular name.  If the name is omitted when initially
+        creating the checkpoint, then libvirt will make up a name for
+        the checkpoint, based on the time when it was created.
+      </dd>

Why not simplify and require the use to provide a name?
 
+      <dt><code>description</code></dt>
+      <dd>A human-readable description of the checkpoint.  If the
+        description is omitted when initially creating the checkpoint,
+        then this field will be empty.
+      </dd>
+      <dt><code>disks</code></dt>
+      <dd>On input, this is an optional listing of specific
+        instructions for disk checkpoints; it is needed when making a
+        checkpoint on only a subset of the disks associated with a
+        domain (in particular, since qemu checkpoints require qcow2
+        disks, this element may be needed on input for excluding guest
+        disks that are not in qcow2 format); if omitted on input, then
+        all disks participate in the checkpoint.  On output, this is
+        fully populated to show the state of each disk in the
+        checkpoint.  This element has a list of <code>disk</code>
+        sub-elements, describing anywhere from one to all of the disks
+        associated with the domain.

Why not always specify the disks?
 
+        <dl>
+          <dt><code>disk</code></dt>
+          <dd>This sub-element describes the checkpoint properties of
+            a specific disk.  The attribute <code>name</code> is
+            mandatory, and must match either the <code>&lt;target
+            dev='name'/&gt;</code> or an unambiguous <code>&lt;source
+            file='name'/&gt;</code> of one of
+            the <a href="formatdomain.html#elementsDisks">disk
+            devices</a> specified for the domain at the time of the
+            checkpoint.  The attribute <code>checkpoint</code> is
+            optional on input; possible values are <code>no</code>
+            when the disk does not participate in this checkpoint;
+            or <code>bitmap</code> if the disk will track all changes
+            since the creation of this checkpoint via a bitmap, in
+            which case another attribute <code>bitmap</code> will be
+            the name of the tracking bitmap (defaulting to the
+            checkpoint name).

Seems too complicated. Why do we need to support a checkpoint
referencing a bitmap with a different name?

Instead we can have a list of disk that will participate in the checkpoint.
Anything not specified will not participate in the snapshot. The name of
the checkpoint is always the name of the bitmap.
 
+          </dd>
+        </dl>
+      </dd>
+      <dt><code>creationTime</code></dt>
+      <dd>The time this checkpoint was created.  The time is specified
+        in seconds since the Epoch, UTC (i.e. Unix time).  Readonly.
+      </dd>
+      <dt><code>parent</code></dt>
+      <dd>The parent of this checkpoint.  If present, this element
+        contains exactly one child element, name.  This specifies the
+        name of the parent checkpoint of this one, and is used to
+        represent trees of checkpoints.  Readonly.
+      </dd>

I think we are missing here the size of the underlying data in every
disk. This probably means how many dirty bits we have in the bitmaps
referenced by the checkpoint for every disk.
 
+      <dt><code>domain</code></dt>
+      <dd>The inactive <a href="formatdomain.html">domain
+        configuration</a> at the time the checkpoint was created.
+        Readonly.

What do you mean by "inactive domain configuration"?
 
+      </dd>
+    </dl>
+
+    <h2><a id="BackupAttributes">Backup XML</a></h2>
+
+    <p>
+      Creating a backup, whether full or incremental, is done
+      via <code>virDomainBackupBegin()</code>, which takes an XML
+      description of the actions to perform.  There are two general
+      modes for backups: a push mode (where the hypervisor writes out
+      the data to the destination file, which may be local or remote),
+      and a pull mode (where the hypervisor creates an NBD server that
+      a third-party client can then read as needed, and which requires
+      the use of temporary storage, typically local, until the backup
+      is complete). 
+    </p>
+    <p>
+      The instructions for beginning a backup job are provided as
+      attributes and elements of the
+      top-level <code>domainbackup</code> element.  This element
+      includes an optional attribute <code>mode</code> which can be
+      either "push" or "pull" (default push).  Where elements are
+      optional on creation, <code>virDomainBackupGetXMLDesc()</code>
+      can be used to see the actual values selected (for example,
+      learning which port the NBD server is using in the pull model,
+      or what file names libvirt generated when none were supplied).
+      The following child elements are supported:
+    </p>
+    <dl>
+      <dt><code>incremental</code></dt>
+      <dd>Optional. If this element is present, it must name an
+        existing checkpoint of the domain, which will be used to make
+        this backup an incremental one (in the push model, only
+        changes since the checkpoint are written to the destination;
+        in the pull model, the NBD server uses the
+        NBD_OPT_SET_META_CONTEXT extension to advertise to the client
+        which portions of the export contain changes since the
+        checkpoint).  If omitted, a full backup is performed.

Just to make it clear:

For example we start with:

    c1 c2 [c3]

c3 is the active checkpoint.

We create a new checkpoint:

    c1 c2 c3 [c4]

So
- using incremental=c2, we will get data referenced by c2?
- using incremental=c1, we will get data reference by both c1 and c2?

What if we want to backup only data from c1 to c2, not including c3?
I don't have a use case for this, but if we can specify tow checkpoints
this can be possible.

For example:

    <chekpoints from="c1",  to="c2">

Or

    <checkpoints from="c2">
 
+      </dd>
+      <dt><code>server</code></dt>
+      <dd>Present only for a pull mode backup.  Contains the same
+       attributes as the <code>protocol</code> element of a disk
+       attached via NBD in the domain (such as transport, socket,
+       name, port, or tls), necessary to set up an NBD server that
+       exposes the content of each disk at the time the backup
+       started.
+      </dd>

To get the list of changed blocks, we planned to use something like:

    qemu-img map nbd+unix:/socket=server.sock

Is this possible now? planned?

To get the actual data, oVirt needs a device to read from. We don't want
to write nbd-client, and we cannot use qemu-img since it does not support
streaming data, and we want to stream data using http to the backup
application.

I guess we will have do this:

   qemu-nbd -c /dev/nbd0 nbd+unix:/socket=server.sock

And serve the data from /dev/nbd0.
 
+      <dt><code>disks</code></dt>
+      <dd>This is an optional listing of instructions for disks
+        participating in the backup (if omitted, all disks
+        participate, and libvirt attempts to generate filenames by
+        appending the current timestamp as a suffix). When provided on
+        input, disks omitted from the list do not participate in the
+        backup.  On output, the list is present but contains only the
+        disks participating in the backup job.  This element has a
+        list of <code>disk</code> sub-elements, describing anywhere
+        from one to all of the disks associated with the domain.
+        <dl>
+          <dt><code>disk</code></dt>
+          <dd>This sub-element describes the checkpoint properties of
+            a specific disk.  The attribute <code>name</code> is
+            mandatory, and must match either the <code>&lt;target
+            dev='name'/&gt;</code> or an unambiguous <code>&lt;source
+            file='name'/&gt;</code> of one of
+            the <a href="formatdomain.html#elementsDisks">disk
+            devices</a> specified for the domain at the time of the
+            checkpoint.  The optional attribute <code>type</code> can
+            be <code>file</code>, <code>block</code>,
+            or <code>networks</code>, similar to a disk declaration
+            for a domain, controls what additional sub-elements are
+            needed to describe the destination (such
+            as <code>protocol</code> for a network destination).  In
+            push mode backups, the primary subelement
+            is <code>target</code>; in pull mode, the primary sublement
+            is <code>scratch</code>; but either way,
+            the primary sub-element describes the file name to be used
+            during the backup operation, similar to
+            the <code>source</code> sub-element of a domain disk. An
+            optional sublement <code>driver</code> can also be used to
+            specify a destination format different from qcow2.

This should be similar to the way we specify disks for vm, right?
Anything that works as vm disk will work for pushing backups?
 
I will continue with the rest of the patch later.

Nir

+          </dd>
+        </dl>
+      </dd>
+    </dl>
+
+    <h2><a id="example">Examples</a></h2>
+
+    <p>Using this XML to create a checkpoint of just vda on a qemu
+      domain with two disks and a prior checkpoint:</p>
+    <pre>
+&lt;domaincheckpoint&gt;
+  &lt;description&gt;Completion of updates after OS install&lt;/description&gt;
+  &lt;disks&gt;
+    &lt;disk name='vda' checkpoint='bitmap'/&gt;
+    &lt;disk name='vdb' checkpoint='no'/&gt;
+  &lt;/disks&gt;
+&lt;/domaincheckpoint&gt;</pre>
+
+    <p>will result in XML similar to this from
+      <code>virDomainCheckpointGetXMLDesc()</code>:</p>
+    <pre>
+&lt;domaincheckpoint&gt;
+  &lt;name&gt;1525889631&lt;/name&gt;
+  &lt;description&gt;Completion of updates after OS install&lt;/description&gt;
+  &lt;creationTime&gt;1525889631&lt;/creationTime&gt;
+  &lt;parent&gt;
+    &lt;name&gt;1525111885&lt;/name&gt;
+  &lt;/parent&gt;
+  &lt;disks&gt;
+    &lt;disk name='vda' checkpoint='bitmap' bitmap='1525889631'/&gt;
+    &lt;disk name='vdb' checkpoint='no'/&gt;
+  &lt;/disks&gt;
+  &lt;domain&gt;
+    &lt;name&gt;fedora&lt;/name&gt;
+    &lt;uuid&gt;93a5c045-6457-2c09-e56c-927cdf34e178&lt;/uuid&gt;
+    &lt;memory&gt;1048576&lt;/memory&gt;
+    ...
+    &lt;devices&gt;
+      &lt;disk type='file' device='disk'&gt;
+        &lt;driver name='qemu' type='qcow2'/&gt;
+        &lt;source file='/path/to/file1'/&gt;
+        &lt;target dev='vda' bus='virtio'/&gt;
+      &lt;/disk&gt;
+      &lt;disk type='file' device='disk' snapshot='external'&gt;
+        &lt;driver name='qemu' type='raw'/&gt;
+        &lt;source file='/path/to/file2'/&gt;
+        &lt;target dev='vdb' bus='virtio'/&gt;
+      &lt;/disk&gt;
+      ...
+    &lt;/devices&gt;
+  &lt;/domain&gt;
+&lt;/domaincheckpoint&gt;</pre>
+
+    <p>With that checkpoint created, the qcow2 image is now tracking
+      all changes that occur in the image since the checkpoint via
+      the persistent bitmap named <code>1525889631</code>.  Now, we
+      can make a subsequent call
+      to <code>virDomainBackupBegin()</code> to perform an incremental
+      backup of just this data, using the following XML to start a
+      pull model NBD export of the vda disk:
+    </p>
+    <pre>
+&lt;domainbackup mode="pull"&gt;
+  &lt;incremental&gt;1525889631&lt;/incremental&gt;
+  &lt;server transport="unix" socket="/path/to/server"/&gt;
+  &lt;disks/&gt;
+    &lt;disk name='vda' type='file'/&gt;
+      &lt;scratch file=/path/to/file1.scratch'/&gt;
+    &lt;/disk&gt;
+  &lt;/disks/&gt;
+&lt;/domainbackup&gt;
+    </pre>
+  </body>
+</html>
diff --git a/docs/schemas/domaincheckpoint.rng b/docs/schemas/domaincheckpoint.rng
new file mode 100644
index 0000000000..1e2c16e035
--- /dev/null
+++ b/docs/schemas/domaincheckpoint.rng
@@ -0,0 +1,89 @@
+<?xml version="1.0"?>
+<!-- A Relax NG schema for the libvirt domain checkpoint properties XML format -->
+<grammar xmlns="http://relaxng.org/ns/structure/1.0">
+  <start>
+    <ref name='domaincheckpoint'/>
+  </start>
+
+  <include href='domaincommon.rng'/>
+
+  <define name='domaincheckpoint'>
+    <element name='domaincheckpoint'>
+      <interleave>
+        <optional>
+          <element name='name'>
+            <text/>
+          </element>
+        </optional>
+        <optional>
+          <element name='description'>
+            <text/>
+          </element>
+        </optional>
+        <optional>
+          <element name='creationTime'>
+            <text/>
+          </element>
+        </optional>
+        <optional>
+          <element name='disks'>
+            <zeroOrMore>
+              <ref name='diskcheckpoint'/>
+            </zeroOrMore>
+          </element>
+        </optional>
+        <optional>
+          <choice>
+            <element name='domain'>
+              <element name='uuid'>
+                <ref name="UUID"/>
+              </element>
+            </element>
+            <!-- Nested grammar ensures that any of our overrides of
+                 storagecommon/domaincommon defines do not conflict
+                 with any domain.rng overrides.  -->
+            <grammar>
+              <include href='domain.rng'/>
+            </grammar>
+          </choice>
+        </optional>
+        <optional>
+          <element name='parent'>
+            <element name='name'>
+              <text/>
+            </element>
+          </element>
+        </optional>
+      </interleave>
+    </element>
+  </define>
+
+  <define name='diskcheckpoint'>
+    <element name='disk'>
+      <attribute name='name'>
+        <choice>
+          <ref name='diskTarget'/>
+          <ref name='absFilePath'/>
+        </choice>
+      </attribute>
+      <choice>
+        <attribute name='checkpoint'>
+          <value>no</value>
+        </attribute>
+        <group>
+          <optional>
+            <attribute name='checkpoint'>
+              <value>bitmap</value>
+            </attribute>
+          </optional>
+          <optional>
+            <attribute name='bitmap'>
+              <text/>
+            </attribute>
+          </optional>
+        </group>
+      </choice>
+    </element>
+  </define>
+
+</grammar>
diff --git a/libvirt.spec.in b/libvirt.spec.in
index ace05820aa..50bd79a7d7 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -2044,6 +2044,7 @@ exit 0
 %{_datadir}/libvirt/schemas/cputypes.rng
 %{_datadir}/libvirt/schemas/domain.rng
 %{_datadir}/libvirt/schemas/domaincaps.rng
+%{_datadir}/libvirt/schemas/domaincheckpoint.rng
 %{_datadir}/libvirt/schemas/domaincommon.rng
 %{_datadir}/libvirt/schemas/domainsnapshot.rng
 %{_datadir}/libvirt/schemas/interface.rng
diff --git a/mingw-libvirt.spec.in b/mingw-libvirt.spec.in
index 917d2143d8..6912527cf7 100644
--- a/mingw-libvirt.spec.in
+++ b/mingw-libvirt.spec.in
@@ -241,6 +241,7 @@ rm -rf $RPM_BUILD_ROOT%{mingw64_libexecdir}/libvirt-guests.sh
 %{mingw32_datadir}/libvirt/schemas/cputypes.rng
 %{mingw32_datadir}/libvirt/schemas/domain.rng
 %{mingw32_datadir}/libvirt/schemas/domaincaps.rng
+%{mingw32_datadir}/libvirt/schemas/domaincheckpoint.rng
 %{mingw32_datadir}/libvirt/schemas/domaincommon.rng
 %{mingw32_datadir}/libvirt/schemas/domainsnapshot.rng
 %{mingw32_datadir}/libvirt/schemas/interface.rng
@@ -326,6 +327,7 @@ rm -rf $RPM_BUILD_ROOT%{mingw64_libexecdir}/libvirt-guests.sh
 %{mingw64_datadir}/libvirt/schemas/cputypes.rng
 %{mingw64_datadir}/libvirt/schemas/domain.rng
 %{mingw64_datadir}/libvirt/schemas/domaincaps.rng
+%{mingw32_datadir}/libvirt/schemas/domaincheckpoint.rng
 %{mingw64_datadir}/libvirt/schemas/domaincommon.rng
 %{mingw64_datadir}/libvirt/schemas/domainsnapshot.rng
 %{mingw64_datadir}/libvirt/schemas/interface.rng
diff --git a/tests/domaincheckpointxml2xmlin/empty.xml b/tests/domaincheckpointxml2xmlin/empty.xml
new file mode 100644
index 0000000000..dc36449142
--- /dev/null
+++ b/tests/domaincheckpointxml2xmlin/empty.xml
@@ -0,0 +1 @@
+<domaincheckpoint/>
diff --git a/tests/domaincheckpointxml2xmlout/empty.xml b/tests/domaincheckpointxml2xmlout/empty.xml
new file mode 100644
index 0000000000..a26c7caab0
--- /dev/null
+++ b/tests/domaincheckpointxml2xmlout/empty.xml
@@ -0,0 +1,10 @@
+<domaincheckpoint>
+  <name>1525889631</name>
+  <creationTime>1525889631</creationTime>
+  <disks>
+    <disk name='vda' checkpoint='bitmap' bitmap='1525889631'/>
+  </disks>
+  <domain>
+    <uuid>9d37b878-a7cc-9f9a-b78f-49b3abad25a8</uuid>
+  </domain>
+</domaincheckpoint>
diff --git a/tests/virschematest.c b/tests/virschematest.c
index 2d35833919..b866db4326 100644
--- a/tests/virschematest.c
+++ b/tests/virschematest.c
@@ -223,6 +223,8 @@ mymain(void)
                 "genericxml2xmloutdata", "xlconfigdata", "libxlxml2domconfigdata",
                 "qemuhotplugtestdomains");
     DO_TEST_DIR("domaincaps.rng", "domaincapsschemadata");
+    DO_TEST_DIR("domaincheckpoint.rng", "domaincheckpointxml2xmlin",
+                "domaincheckpointxml2xmlout");
     DO_TEST_DIR("domainsnapshot.rng", "domainsnapshotxml2xmlin",
                 "domainsnapshotxml2xmlout");
     DO_TEST_DIR("interface.rng", "interfaceschemadata");
--
2.14.4