[libvirt] [PATCH 0/3 TCK] Some tests for snapshot management

All these tests fail with current GIT, but are intended to work with Eric's snapshot series applied. The first test in this list, however, does not pass. Eric's current tree forbids calling 'Destroy' on a transient guest which has snapshots present. IMHO this is wrong, because the guest may itself exit at any time, which leaves us in the same state as is 'Destroy' had been called wrt snapshots. IMHO, we should be allowed to call 'virDomainDestroy' on a transient guest with snapshots, and then later 'virDomainCreateXML' to re-create the guest and still have access to the snapshots In other words, just because a transient guest is not currently present, does not mean we should forget about its snapshots as a management app can re-create it again at any time. Forbidding virDomainUndefine with snapshots is more reasonable, but I think its still possible to argue that it should be allowed and that a later virDomainDefine with the same name+uuid should resurrect any previous snapshots.

From: "Daniel P. Berrange" <berrange@redhat.com> The test case validates the core lifecycle operations on persistent domains with snapshots. A transient domain should allow snapshots to be created while running. It should be possible to destory and restart the domain and still have snapshots present. --- scripts/domain-snapshot/050-transient-snapshot.t | 97 ++++++++++++++++++++++ 1 files changed, 97 insertions(+), 0 deletions(-) create mode 100644 scripts/domain-snapshot/050-transient-snapshot.t diff --git a/scripts/domain-snapshot/050-transient-snapshot.t b/scripts/domain-snapshot/050-transient-snapshot.t new file mode 100644 index 0000000..a314248 --- /dev/null +++ b/scripts/domain-snapshot/050-transient-snapshot.t @@ -0,0 +1,97 @@ +# -*- perl -*- +# +# Copyright (C) 2011 Red Hat, Inc. +# +# This program is free software; You can redistribute it and/or modify +# it under the GNU General Public License as published by the Free +# Software Foundation; either version 2, or (at your option) any +# later version +# +# The file "LICENSE" distributed along with this file provides full +# details of the terms and conditions +# + +=pod + +=head1 NAME + +domain/050-transient-snapshot.t - lifecycle of transient domains with snapshots + +=head1 DESCRIPTION + +The test case validates the core lifecycle operations on +persistent domains with snapshots. A transient domain +should allow snapshots to be created while running. It +should be possible to destory and restart the domain +and still have snapshots present. + +=cut + +use strict; +use warnings; + +use Test::More tests => 12; +use Test::Exception; +use Sys::Virt::TCK; +use Sys::Virt::TCK::DomainSnapshotBuilder; + +my $tck = Sys::Virt::TCK->new(); +my $conn = eval { $tck->setup(); }; +BAIL_OUT "failed to setup test harness: $@" if $@; +END { $tck->cleanup if $tck; } + +my $poolxml = $tck->generic_pool("dir")->as_xml; + +diag "Defining transient storage pool"; +my $pool; +ok_pool(sub { $pool = $conn->define_storage_pool($poolxml) }, "define transient storage pool"); +lives_ok(sub { $pool->build(0) }, "built storage pool"); +lives_ok(sub { $pool->create }, "started storage pool"); + +my $volxml = $tck->generic_volume("tck", "qcow2", 1024*1024*50)->as_xml; +my $vol; +ok_volume(sub { $vol = $pool->create_volume($volxml) }, "create qcow2 volume"); + +my $volpath = xpath($vol, "string(/volume/target/path)"); +diag "Vol path $volpath"; +my $domb = $tck->generic_domain("tck"); +$domb->{disks} = []; +my $domxml = $domb->disk(type => "file", src => $volpath, dst => "hdb", + format => {name => "qemu", type => "qcow2"})->as_xml; + +diag "Creating a new transient domain"; +my $dom; +ok_domain(sub { $dom = $conn->create_domain($domxml) }, "created transient domain object"); + + + +my $ssxml = Sys::Virt::TCK::DomainSnapshotBuilder->new(name => "ss1")->as_xml; + +my $domss; +diag "Taking a snapshot"; +ok_domain_snapshot(sub { $domss = $dom->create_snapshot($ssxml)}, "created domain snapshot"); + +lives_ok(sub { $dom->destroy }, "destroyed domain"); + +diag "Re-creating the domain"; +ok_domain(sub { $dom = $conn->create_domain($domxml) }, "created transient domain object"); + +diag "Looking up snapshot"; +$domss = undef; +lives_ok(sub { $domss = $dom->get_snapshot_by_name("ss1"); }, "snapshot still exists"); + +diag "Deleting snapshot"; +lives_ok(sub { $domss->delete }, "deleted snapshot"); + +diag "Destroying the transient domain"; +lives_ok(sub { $dom->destroy }, "destroyed domain"); + +diag "Checking that transient domain has gone away"; +ok_error(sub { $conn->get_domain_by_name("tck") }, "NO_DOMAIN error raised from missing domain", + Sys::Virt::Error::ERR_NO_DOMAIN); + +$vol->delete; +$pool->destroy; +$pool->delete; + +# end -- 1.7.6

On 08/26/2011 08:16 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange"<berrange@redhat.com>
The test case validates the core lifecycle operations on persistent domains with snapshots. A transient domain should allow snapshots to be created while running. It should be possible to destory and restart the domain
s/destory/destroy
and still have snapshots present.
See my reply to 0/3. I think this test is making an invalid assumption. If we leave stale snapshot metadata behind, then create a new domain with different uuid but same name, then we are setting ourselves up for confusion. We _have_ to clean up the metadata at some point in time, either when a new domain is created whose name happens to match stale metadata from a previous domain, or when the last reference to a transient domain disappears. But if we also make it possible to reinstate metadata, via VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE, then the app managing a transient domain will be able to reinstate everything needed to be able to safely revert back to the snapshot storage (when the prior transient domain shut down, it was only the metadata that was lost, not the actual storage). At any rate, thanks for making me think about this, and I guess I'd better get CREATE_REDEFINE working sooner rather than later. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> The test case validates the core lifecycle operations on persistent domains with snapshots. A persistent domain should allow snapshots to be created & deleted while both running and shutoff. It should forbid undefine of the domain when it has snapshots present. --- scripts/domain-snapshot/080-persistent-snapshot.t | 112 +++++++++++++++++++++ 1 files changed, 112 insertions(+), 0 deletions(-) create mode 100644 scripts/domain-snapshot/080-persistent-snapshot.t diff --git a/scripts/domain-snapshot/080-persistent-snapshot.t b/scripts/domain-snapshot/080-persistent-snapshot.t new file mode 100644 index 0000000..4064003 --- /dev/null +++ b/scripts/domain-snapshot/080-persistent-snapshot.t @@ -0,0 +1,112 @@ +# -*- perl -*- +# +# Copyright (C) 2011 Red Hat, Inc. +# +# This program is free software; You can redistribute it and/or modify +# it under the GNU General Public License as published by the Free +# Software Foundation; either version 2, or (at your option) any +# later version +# +# The file "LICENSE" distributed along with this file provides full +# details of the terms and conditions +# + +=pod + +=head1 NAME + +domain/080-persistent-snapshot.t - lifecycle of persistent domains with snapshots + +=head1 DESCRIPTION + +The test case validates the core lifecycle operations on +persistent domains with snapshots. A persistent domain +should allow snapshots to be created & deleted while both +running and shutoff. It should forbid undefine of the +domain when it has snapshots present. + +=cut + +use strict; +use warnings; + +use Test::More tests => 18; +use Test::Exception; +use Sys::Virt::TCK; +use Sys::Virt::TCK::DomainSnapshotBuilder; + +my $tck = Sys::Virt::TCK->new(); +my $conn = eval { $tck->setup(); }; +BAIL_OUT "failed to setup test harness: $@" if $@; +END { $tck->cleanup if $tck; } + +my $poolxml = $tck->generic_pool("dir")->as_xml; + +diag "Defining transient storage pool"; +my $pool; +ok_pool(sub { $pool = $conn->define_storage_pool($poolxml) }, "define transient storage pool"); +lives_ok(sub { $pool->build(0) }, "built storage pool"); +lives_ok(sub { $pool->create }, "started storage pool"); + +my $volxml = $tck->generic_volume("tck", "qcow2", 1024*1024*50)->as_xml; +my $vol; +ok_volume(sub { $vol = $pool->create_volume($volxml) }, "create qcow2 volume"); + +my $volpath = xpath($vol, "string(/volume/target/path)"); +diag "Vol path $volpath"; +my $domb = $tck->generic_domain("tck"); +$domb->{disks} = []; +my $domxml = $domb->disk(type => "file", src => $volpath, dst => "hdb", + format => {name => "qemu", type => "qcow2"})->as_xml; + +diag "Defining a new persistent domain"; +my $dom; +ok_domain(sub { $dom = $conn->define_domain($domxml) }, "defined persistent domain object"); + +my $ss1xml = Sys::Virt::TCK::DomainSnapshotBuilder->new(name => "ss1")->as_xml; +my $ss2xml = Sys::Virt::TCK::DomainSnapshotBuilder->new(name => "ss2")->as_xml; + +my $domss1; +my $domss2; +diag "Taking a snapshot while inactive"; +ok_domain_snapshot(sub { $domss2 = $dom->create_snapshot($ss1xml)}, "created domain snapshot ss1"); + +diag "Starting domain"; +lives_ok(sub { $dom->create }, "started domain"); + +diag "Taking a snapshot while active"; +ok_domain_snapshot(sub { $domss2 = $dom->create_snapshot($ss2xml)}, "created domain snapshot ss2"); + +lives_ok(sub { $dom->destroy }, "destroyed domain"); + +diag "Re-creating the domain"; +lives_ok(sub { $dom->create }, "recreated persistent domain object"); + +diag "Looking up snapshot"; +$domss1 = undef; +lives_ok(sub { $domss1 = $dom->get_snapshot_by_name("ss1"); }, "snapshot ss1 still exists"); +lives_ok(sub { $domss2 = $dom->get_snapshot_by_name("ss2"); }, "snapshot ss2 still exists"); + +diag "Deleting snapshot ss1"; +lives_ok(sub { $domss1->delete }, "deleted snapshot ss1"); + +diag "Destroying the transient domain"; +lives_ok(sub { $dom->destroy }, "destroyed domain"); + +diag "Trying to undefine with a snapshot present"; +ok_error(sub { $dom->undefine }, Sys::Virt::Error::ERR_OPERATION_INVALID); + +diag "Deleting snapshot ss2"; +lives_ok(sub { $domss2->delete }, "deleted snapshot ss2"); + +lives_ok(sub { $dom->undefine }, "undefined persistent domain"); + +diag "Checking that transient domain has gone away"; +ok_error(sub { $conn->get_domain_by_name("tck") }, "NO_DOMAIN error raised from missing domain", + Sys::Virt::Error::ERR_NO_DOMAIN); + +$vol->delete; +$pool->destroy; +$pool->delete; + +# end -- 1.7.6

On 08/26/2011 08:16 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange"<berrange@redhat.com>
The test case validates the core lifecycle operations on persistent domains with snapshots. A persistent domain should allow snapshots to be created& deleted while both running and shutoff.
This test is good, but could use more coverage - you need to test the following 9 transitions, to be sure we covered everything (because my series proved we had bugs in multiple of these transitions): current -> snapshot state ------------------ shutoff -> shutoff shutoff -> paused (0.9.4 buggy - it temporarily ran the vcpus, rather than restoring to the exact state where the paused snapshot was created) shutoff -> running paused -> shutoff (0.9.4 buggy with newer qemu, because it tried to use qemu -loadvm with no vm state) paused -> paused paused -> running (0.9.4 buggy - it left the restored domain paused) running -> shutoff (0.9.4 buggy with newer qemu, because it tried to use qemu -loadvm with no vm state) running -> paused running -> running It becomes 12 transitions when you add disk snapshots into the series, and once I implement VIR_DOMAIN_SNAPSHOT_REVERT_{START,PAUSED,FORCE}, we have even more combinations.
It should forbid undefine of the domain when it has snapshots present. --- scripts/domain-snapshot/080-persistent-snapshot.t | 112 +++++++++++++++++++++ 1 files changed, 112 insertions(+), 0 deletions(-) create mode 100644 scripts/domain-snapshot/080-persistent-snapshot.t
+my $domss1; +my $domss2; +diag "Taking a snapshot while inactive"; +ok_domain_snapshot(sub { $domss2 = $dom->create_snapshot($ss1xml)}, "created domain snapshot ss1"); + +diag "Starting domain"; +lives_ok(sub { $dom->create }, "started domain"); + +diag "Taking a snapshot while active"; +ok_domain_snapshot(sub { $domss2 = $dom->create_snapshot($ss2xml)}, "created domain snapshot ss2");
This only creates 2 of the 3 needed snapshots to fully exercise the possible 9 transitions. Also, I don't see any tests of reverting to a snapshot state - you appear to be testing the management of creation and deletion of snapshots, but not reversion. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

From: "Daniel P. Berrange" <berrange@redhat.com> The test case validates it recursive deletion of snapshots cleans up all children and leaves no orphans behind. --- .../domain-snapshot/100-delete-snapshot-children.t | 101 ++++++++++++++++++++ 1 files changed, 101 insertions(+), 0 deletions(-) create mode 100644 scripts/domain-snapshot/100-delete-snapshot-children.t diff --git a/scripts/domain-snapshot/100-delete-snapshot-children.t b/scripts/domain-snapshot/100-delete-snapshot-children.t new file mode 100644 index 0000000..3ebf2b9 --- /dev/null +++ b/scripts/domain-snapshot/100-delete-snapshot-children.t @@ -0,0 +1,101 @@ +# -*- perl -*- +# +# Copyright (C) 2011 Red Hat, Inc. +# +# This program is free software; You can redistribute it and/or modify +# it under the GNU General Public License as published by the Free +# Software Foundation; either version 2, or (at your option) any +# later version +# +# The file "LICENSE" distributed along with this file provides full +# details of the terms and conditions +# + +=pod + +=head1 NAME + +domain/100-delete-snapshot-children.t - recursive deletion of snapshot children + +=head1 DESCRIPTION + +The test case validates it recursive deletion of snapshots +cleans up all children and leaves no orphans behind. + +=cut + +use strict; +use warnings; + +use Test::More tests => 29; +use Test::Exception; +use Sys::Virt::TCK; +use Sys::Virt::TCK::DomainSnapshotBuilder; + +my $tck = Sys::Virt::TCK->new(); +my $conn = eval { $tck->setup(); }; +BAIL_OUT "failed to setup test harness: $@" if $@; +END { $tck->cleanup if $tck; } + +my $poolxml = $tck->generic_pool("dir")->as_xml; + +diag "Defining transient storage pool"; +my $pool; +ok_pool(sub { $pool = $conn->define_storage_pool($poolxml) }, "define transient storage pool"); +lives_ok(sub { $pool->build(0) }, "built storage pool"); +lives_ok(sub { $pool->create }, "started storage pool"); + +my $volxml = $tck->generic_volume("tck", "qcow2", 1024*1024*50)->as_xml; +my $vol; +ok_volume(sub { $vol = $pool->create_volume($volxml) }, "create qcow2 volume"); + +my $volpath = xpath($vol, "string(/volume/target/path)"); +diag "Vol path $volpath"; +my $domb = $tck->generic_domain("tck"); +$domb->{disks} = []; +my $domxml = $domb->disk(type => "file", src => $volpath, dst => "hdb", + format => {name => "qemu", type => "qcow2"})->as_xml; + +diag "Creating a new transient domain"; +my $dom; +ok_domain(sub { $dom = $conn->create_domain($domxml) }, "created transient domain object"); + + +my $ss; +for (my $i = 0 ; $i < 8 ; $i++) { + my $ssxml = Sys::Virt::TCK::DomainSnapshotBuilder->new(name => "ss$i")->as_xml; + ok_domain_snapshot(sub { $ss = $dom->create_snapshot($ssxml) }, "created domain snapshot ss$i"); +} + +lives_ok(sub { $ss = $dom->get_snapshot_by_name("ss4") }, "snapshot ss4 exists"); +lives_ok(sub { $ss->delete(Sys::Virt::DomainSnapshot::DELETE_CHILDREN) }, "deleted all snapshots from ss4 onwards"); + +for (my $i = 0 ; $i < 8 ; $i++) { + my $ss; + eval { $ss = $dom->get_snapshot_by_name("ss$i") }; + + if ($i < 4) { + ok_domain_snapshot(sub { $ss }, "snapshot ss$i still exists"); + } else { + ok(!defined $ss, "snapshot ss$i has been deleted"); + } +} + +for (my $i = 0 ; $i < 4 ; $i++) { + my $ss; + lives_ok(sub { $ss = $dom->get_snapshot_by_name("ss$i"); $ss->delete }, "deleted snapshot ss$i"); +} + + +diag "Destroying the transient domain"; +lives_ok(sub { $dom->destroy }, "destroyed domain"); + +diag "Checking that transient domain has gone away"; +ok_error(sub { $conn->get_domain_by_name("tck") }, "NO_DOMAIN error raised from missing domain", + Sys::Virt::Error::ERR_NO_DOMAIN); + +$vol->delete; +$pool->destroy; +$pool->delete; + +# end -- 1.7.6

On 08/26/2011 08:16 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange"<berrange@redhat.com>
The test case validates it recursive deletion of snapshots cleans up all children and leaves no orphans behind. --- .../domain-snapshot/100-delete-snapshot-children.t | 101 ++++++++++++++++++++ 1 files changed, 101 insertions(+), 0 deletions(-) create mode 100644 scripts/domain-snapshot/100-delete-snapshot-children.t
+ +The test case validates it recursive deletion of snapshots +cleans up all children and leaves no orphans behind.
Good, but incomplete. You also need to test handling of the current snapshot in relation to deletion of children. In the notation where * represents the current snapshot, and using 'parent -> child', if I have: s1 -> s2 -> s3 -> s4 -> *s5 then delete s2 alone, I should have: s1 -> s3 -> s4 -> *s5 then if I delete s3 and children, I should have: *s1 That is, I think your test should also be probing the name of the current snapshot along each phase of the test, to ensure that the current snapshot properly migrates up the tree to the parent of the snapshot subtree that just got deleted, since that was also buggy in 0.9.4.
+diag "Creating a new transient domain"; +my $dom; +ok_domain(sub { $dom = $conn->create_domain($domxml) }, "created transient domain object");
system checkpoints of running domains are a lot slower than those of inactive persistent guests (that is, 'qemu savevm' takes a lot longer than 'qemu-img snapshot -c'), if that matters at all to your test. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 08/26/2011 08:16 AM, Daniel P. Berrange wrote:
All these tests fail with current GIT, but are intended to work with Eric's snapshot series applied.
The first test in this list, however, does not pass.
Eric's current tree forbids calling 'Destroy' on a transient guest which has snapshots present. IMHO this is wrong, because the guest may itself exit at any time, which leaves us in the same state as is 'Destroy' had been called wrt snapshots.
Hmm, interesting point. With managed save, we don't have that problem - the only way to have a managed save image is with a persistent def, so we can safely forbid undefine of a domain with managed save state.
IMHO, we should be allowed to call 'virDomainDestroy' on a transient guest with snapshots, and then later 'virDomainCreateXML' to re-create the guest and still have access to the snapshots
But leaving the snapshot metadata files behind is problematic. If you create a new domain with the same name but a different uuid, then those existing metadata files _will_ cause problems with the domain definition.
In other words, just because a transient guest is not currently present, does not mean we should forget about its snapshots as a management app can re-create it again at any time.
How about this compromise? I already documented in my RFC the desire to add a new flag, virDomainSnapshotCreateXML(, VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE), which will allow a management app to recreate a metadata file. I haven't yet implemented it, but think that it will solve this problem, as follows: Libvirt should not ever forbid destroy of a domain with snapshots. However, when a transient guest shuts down, then part of the cleanup that libvirt does for that guest is to remove all existing snapshot metadata (the snapshots themselves continue to exist, but the libvirt hierarchy of creation date, domain xml, and parent relationships is gone). On destroy, either the domain is persistent (so the snapshots will still be useful from the inactive guest), or the domain is transient (so the management app has to be aware of the ramifications). But the management app _already_ has to track domain xml independently if it plans on recreating a new transient guest from the same disk images, so it is not too much of a stretch to assume that the management app can also track snapshot xml. At which point, if the management app wants to recreate a new transient app with the same snapshot metadata, then it simply calls virDomainSnapshotCreateXML(, VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE) for each saved snapshot that it wants to revive in libvirt's tracking. Thus, snapshot metadata is never left behind to confuse the creation of a new guest (persistent domains fail to undefine until the snapshots are cleaned up, and transient domains are cleaned up automatically when the last reference to the domain disappears), but can easily be restored and thus tracked externally. However, this means that your first test would need tweaking - the test itself would have to create the snapshot, capture the snapshots xml, destroy the domain, then create a new domain with the same name and uuid, restore the snapshot, then revert to the snapshot, all to prove that redefining a transient domain does not lose the actual snapshot data, just the metadata that tracked the snapshot. There's also some FIXMEs in libvirt about the notion of extracting the name of all internal snapshots directly from qcow2 files, and exposing those to the user with minimal information. qcow2 lacks parent and domain xml information, so the regenerated snapshot xml would be weak and reverting to it would require the use of VIR_DOMAIN_SNAPSHOT_REVERT_FORCE; but making it easier to expose all known internal snapshot names, even when libvirt does not have the metadata behind the origin of those internal snapshots, might be useful via a new flag to virDomainSnapshotListNames.
Forbidding virDomainUndefine with snapshots is more reasonable,
Good, I'll keep it that way, since it is consistent with managed save images.
but I think its still possible to argue that it should be allowed and that a later virDomainDefine with the same name+uuid should resurrect any previous snapshots.
If resurrecting previous snapshots is important, then it should be done with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Aug 26, 2011 at 08:57:54AM -0600, Eric Blake wrote:
On 08/26/2011 08:16 AM, Daniel P. Berrange wrote:
All these tests fail with current GIT, but are intended to work with Eric's snapshot series applied.
The first test in this list, however, does not pass.
Eric's current tree forbids calling 'Destroy' on a transient guest which has snapshots present. IMHO this is wrong, because the guest may itself exit at any time, which leaves us in the same state as is 'Destroy' had been called wrt snapshots.
Hmm, interesting point.
With managed save, we don't have that problem - the only way to have a managed save image is with a persistent def, so we can safely forbid undefine of a domain with managed save state.
IMHO, we should be allowed to call 'virDomainDestroy' on a transient guest with snapshots, and then later 'virDomainCreateXML' to re-create the guest and still have access to the snapshots
But leaving the snapshot metadata files behind is problematic. If you create a new domain with the same name but a different uuid, then those existing metadata files _will_ cause problems with the domain definition.
Surely this simply means that the snapshot metadata files shold be recording the orignal domain UUID, so that we can safely ignore/discard them if the guest is re-defined with the same name but new uuid ?
In other words, just because a transient guest is not currently present, does not mean we should forget about its snapshots as a management app can re-create it again at any time.
How about this compromise?
I already documented in my RFC the desire to add a new flag, virDomainSnapshotCreateXML(, VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE), which will allow a management app to recreate a metadata file. I haven't yet implemented it, but think that it will solve this problem, as follows:
Libvirt should not ever forbid destroy of a domain with snapshots. However, when a transient guest shuts down, then part of the cleanup that libvirt does for that guest is to remove all existing snapshot metadata (the snapshots themselves continue to exist, but the libvirt hierarchy of creation date, domain xml, and parent relationships is gone).
On destroy, either the domain is persistent (so the snapshots will still be useful from the inactive guest), or the domain is transient (so the management app has to be aware of the ramifications). But the management app _already_ has to track domain xml independently if it plans on recreating a new transient guest from the same disk images, so it is not too much of a stretch to assume that the management app can also track snapshot xml. At which point, if the management app wants to recreate a new transient app with the same snapshot metadata, then it simply calls virDomainSnapshotCreateXML(, VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE) for each saved snapshot that it wants to revive in libvirt's tracking.
Thus, snapshot metadata is never left behind to confuse the creation of a new guest (persistent domains fail to undefine until the snapshots are cleaned up, and transient domains are cleaned up automatically when the last reference to the domain disappears), but can easily be restored and thus tracked externally.
However, this means that your first test would need tweaking - the test itself would have to create the snapshot, capture the snapshots xml, destroy the domain, then create a new domain with the same name and uuid, restore the snapshot, then revert to the snapshot, all to prove that redefining a transient domain does not lose the actual snapshot data, just the metadata that tracked the snapshot.
There's also some FIXMEs in libvirt about the notion of extracting the name of all internal snapshots directly from qcow2 files, and exposing those to the user with minimal information. qcow2 lacks parent and domain xml information, so the regenerated snapshot xml would be weak and reverting to it would require the use of VIR_DOMAIN_SNAPSHOT_REVERT_FORCE; but making it easier to expose all known internal snapshot names, even when libvirt does not have the metadata behind the origin of those internal snapshots, might be useful via a new flag to virDomainSnapshotListNames.
Forbidding virDomainUndefine with snapshots is more reasonable,
Good, I'll keep it that way, since it is consistent with managed save images.
but I think its still possible to argue that it should be allowed and that a later virDomainDefine with the same name+uuid should resurrect any previous snapshots.
If resurrecting previous snapshots is important, then it should be done with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE.
This is all workable, but is it overkill compared to just validating the original VM uuid against the new UUID when loading metdata ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 08/26/2011 09:03 AM, Daniel P. Berrange wrote:
IMHO, we should be allowed to call 'virDomainDestroy' on a transient guest with snapshots, and then later 'virDomainCreateXML' to re-create the guest and still have access to the snapshots
But leaving the snapshot metadata files behind is problematic. If you create a new domain with the same name but a different uuid, then those existing metadata files _will_ cause problems with the domain definition.
Surely this simply means that the snapshot metadata files shold be recording the orignal domain UUID,
They are. Even before my series, the original UUID was the _only_ thing about the domain that it was storing (my series fixes things to store the entire <domain> xml, which is needed for proper reversion to a snapshot, but at least old existing snapshots can already be recognized).
so that we can safely ignore/discard them if the guest is re-defined with the same name but new uuid ?
We have two options for when to nuke stale metadata - at the point when the domain goes down, or at the point when a new domain is created whose name differs from what was stored in the stale metadata. I guess you are leaning towards the latter - if a new domain is created with the right uuid, then the snapshots are already available; but if created with the same name but a new uuid, then we finally know to clean up the mess. But I'm not convinced that it scales well - if you create lots of transient domains, all with snapshots, but never reuse the same name, then the metadata directory keeps growing without bounds even though you never have that many domains at any given point of time. Besides, since the only way to use libvirt to delete snapshot metadata is via a virDomainSnapshotPtr object, but the only way to get at a snapshot is via a virDomainPtr object, I think it is risky to leave stale metadata in libvirt's directory trees that _cannot_ be removed by any existing libvirt API. Cleaning at the time the transient domain goes away is nicer for scalability. It means that nothing will be stored in /var/lib/libvirt/qemu/ unless it can also be accessed via a current virDomainPtr. It is also nicer for migration - remember, in my RFC, I mentioned how snapshots and migration interact - the only sane way is for the destination to start from a blank state, then recreate the same snapshot hierarchy as was present on the source alongside the migration, then remove the snapshot metadata on the source. If we keep snapshot metadata around even after a transient domain is no longer running on a host, then it becomes harder to start from a clean slate when migrating a new guest by that name onto the host.
If resurrecting previous snapshots is important, then it should be done with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE.
This is all workable, but is it overkill compared to just validating the original VM uuid against the new UUID when loading metdata ?
I hope that we can come to agreement on whether or not making the management app of the transient domain have to track snapshot metadata counts as overkill or not. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Aug 26, 2011 at 09:15:43AM -0600, Eric Blake wrote:
On 08/26/2011 09:03 AM, Daniel P. Berrange wrote:
IMHO, we should be allowed to call 'virDomainDestroy' on a transient guest with snapshots, and then later 'virDomainCreateXML' to re-create the guest and still have access to the snapshots
But leaving the snapshot metadata files behind is problematic. If you create a new domain with the same name but a different uuid, then those existing metadata files _will_ cause problems with the domain definition.
Surely this simply means that the snapshot metadata files shold be recording the orignal domain UUID,
They are. Even before my series, the original UUID was the _only_ thing about the domain that it was storing (my series fixes things to store the entire <domain> xml, which is needed for proper reversion to a snapshot, but at least old existing snapshots can already be recognized).
so that we can safely ignore/discard them if the guest is re-defined with the same name but new uuid ?
We have two options for when to nuke stale metadata - at the point when the domain goes down, or at the point when a new domain is created whose name differs from what was stored in the stale metadata.
I guess you are leaning towards the latter - if a new domain is created with the right uuid, then the snapshots are already available; but if created with the same name but a new uuid, then we finally know to clean up the mess.
But I'm not convinced that it scales well - if you create lots of transient domains, all with snapshots, but never reuse the same name, then the metadata directory keeps growing without bounds even though you never have that many domains at any given point of time. Besides, since the only way to use libvirt to delete snapshot metadata is via a virDomainSnapshotPtr object, but the only way to get at a snapshot is via a virDomainPtr object, I think it is risky to leave stale metadata in libvirt's directory trees that _cannot_ be removed by any existing libvirt API.
Cleaning at the time the transient domain goes away is nicer for scalability. It means that nothing will be stored in /var/lib/libvirt/qemu/ unless it can also be accessed via a current virDomainPtr. It is also nicer for migration - remember, in my RFC, I mentioned how snapshots and migration interact - the only sane way is for the destination to start from a blank state, then recreate the same snapshot hierarchy as was present on the source alongside the migration, then remove the snapshot metadata on the source. If we keep snapshot metadata around even after a transient domain is no longer running on a host, then it becomes harder to start from a clean slate when migrating a new guest by that name onto the host.
If resurrecting previous snapshots is important, then it should be done with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE.
This is all workable, but is it overkill compared to just validating the original VM uuid against the new UUID when loading metdata ?
I hope that we can come to agreement on whether or not making the management app of the transient domain have to track snapshot metadata counts as overkill or not.
I don't have a strong feeling on it, as long as we have a way to get back the snapshots when recreating a transient guest. So lets go with your suggestion. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Daniel P. Berrange
-
Eric Blake