Consider the case of a guest that has multiple virtual disks, some
residing on shared storage (such as the OS proper) and some on local
storage (scratch space, where the OS has faster response if the virtual
disk does not have to go over the network, and possibly one where the
guest can still work even if the disk is hot-unplugged). During
migration, you'd want different handling of the two disks (the
destination can already see the shared disk, but must either copy the
contents or recreate a blank scratch volume for the local disk).
Or, consider the case where a guest has one disk as qcow2 (it is not
modified frequently, and benefits from sharing a common backing file
with other guests), while another disk is raw (for better read-write
performance). Right now, 'virsh snapshot' fails, because it only works
if all disks are qcow2; and in fact it may be the case that it is
desirable to only take a snapshot of a subset of the domain's disks.
So, I think we need some way to request an operation on a subset of VM
disks, in a manner that can be shared between migration and volume
management APIs. And I'm not sure it makes sense to add two more
parameters to migration commands (an array of disks, and the size of
that array), nor to modify the snapshot XML to describe which disks
belong to the snapshot.
So I'm thinking we need some sort of API set to manage a stateful set of
disk operations. Maybe the trick is to define that every VM has a
(possibly empty) set of selected disks, with APIs to manage moving a
single disk in or out of the set, an API for listing the entire set,
then a single flag to migration that states that live block migration is
attempted for all disks currently in the VMs selected disk set.
Being stateful, this would have to be represented in XML (so that if
libvirtd is restarted, it remembers which disks are selected); I'm
thinking of adding a new selected='yes|no' attribute to <disk>, as in:
<disk type='file' device='disk' selected='yes'/>
<driver name='qemu' type='raw'/>
...
</disk>
where if the attribute is absent, it defaults to no. For hypervisors
where the state is maintained by libvirtd (qemu, lxc), the XML works;
for other hypervisors, the notion of a subset of selected disks would
have to just fail unless there is some hypervisor-specific way to track
that information alongside a domain.
For my API proposal, I'm including an unused flags argument to all the
virDomainDiskSet* commands (experience has taught me well). In fact, we
could even use that flags parameter, to maintain parallel sets (set 0 is
the set of disks to migrate, set 1 is the set of disks to snapshot,
...), although I don't think we need that complexity yet (besides, it
would affect the proposed XML).
/* Add disk to the domain's set of selected disks; flags ignored for
now; return 0 on success, 1 if already in the set, -1 on failure */
int virDomainDiskSetAdd(virDomainPtr dom, char *disk, unsigned int flags);
/* Remove disk from the domain's set of selected disks; flags ignored
for now; return 0 on success, 1 if already absent from the set, -1 on
failure */
int virDomainDiskSetRemove(virDomainPtr dom, char *disk, unsigned int
flags);
/* Add all disks to the domain's set of selected disks; flags ignored
for now; return 0 on success, -1 on failure */
int virDomainDiskSetAddAll(virDomainPtr dom, unsigned int flags);
/* Remove all disks from the domain's set of selected disks; flags
ignored for now; return 0 on success, -1 on failure */
int virDomainDiskSetRemoveAll(virDomainPtr dom, unsigned int flags);
/* Return the size of the domain's currently selected disk set, or -1 on
failure; flags ignored for now */
int virDomainDiskSetSize(virDomainPtr dom, unsigned int flags);
/* Populate up to n entries of the array with the names of the domain's
selected disk set, and return how many entries were populated, or -1 on
failure; flags ignored for now */
int virDomainDiskSetList(virDomainPtr dom, char **array, int n, unsigned
int flags)
With API in place for tracking a subset of selected disks, we can then
extend existing APIs with new flags:
/* Old way - domain migration without any disks migrated */
virDomainMigrate(dom, dconn, flags | 0, dname, uri, bandwidth)
/* New way - domain migration, including all disks in the domain's
selected disk set being copied to the destination */
virDomainMigrate(dom, dconn, flags | VIR_MIGRATE_WITH_DISK_SET, dname,
uri, bandwidth)
/* Old way - snapshot of all disks */
virDomainSnapshotCreateXML(dom, xml, 0)
/* New way - snapshot of just disks in selected disk set */
virDomainSnapshotCreateXML(dom, xml, VIR_DOMAIN_SAVE_DISK_SET)
I'd also like to see some collaboration between virDomainSave (for
memory) and virDomainSmapshotCreateXML (for disks); unfortunately,
virDomainSave doesn't take a flags argument. Maybe this calls for a new
API, and possibly a new version of the header to a 'virsh save' image to
track the location of snapshotted disks alongside the saved memory state:
/* Save the RAM state of domain to the base file "to". If "xml" is
NULL, no disks are snapshotted. Otherwise, "xml" is a snapshot XML that
describes how disk state will also be saved; if flags includes
VIR_DOMAIN_SAVE_DISK_SET, then the domain's selected disk set is
snapshotted, otherwise all disks are snapshotted. If flags contains
VIR_DOMAIN_SAVE_LIVE, then the guest is resumed after snapshot is
completed; otherwise the guest is halted. */
int virDomainSaveFlags(virDomainPtr dom, const char *to, const char
*xml, unsigned int flags);
Thoughts before I start implementing some of this for post-0.9.1?
--
Eric Blake eblake(a)redhat.com +1-801-349-2682
Libvirt virtualization library
http://libvirt.org