[libvirt] block pull/commit for non-local storage

Hi, with my recent work into snapshots with native libgfapi support I've run into an issue with libvirt APIs used to delete snapshots by management apps. The management apps use the APIs to do the job: int virDomainBlockCommit(virDomainPtr dom, const char *disk, const char *base, const char *top, unsigned long bandwidth, unsigned int flags) int virDomainBlockPull(virDomainPtr dom, const char *disk, unsigned long bandwidth, unsigned int flags) int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags) As you can see in the prototypes of these functions (and from the docs for them which I'm not going to copy here) the user can provide the disk specification in two possible options: 1) a full path if it's unambiguous - this is file centric and requires the file to be in the local filesystem 2) a disk name "vda" - this is selected automatically by libvirt but allows to specify only the top image. For systems that want to use remote storage without local representation such as gluster+libgfapi, this doesn't allow to use the APIs to start block jobs. To solve this issue we need a way to specify paths on remote storage in some way. Below are two options we've discussed on IRC. 1) Use URIs along with the file path to specify disk images. This option would add a new, possibly well documented URIs to specify paths for disk images. These would be libvirt defined URIs (but surprisingly "similar" to qemu URIS) so that hypervisors with different storage specification would need a conversion. This would allow to specify the targets as: vda - disk name /path/to/file - legacy way, path file:///path/to/file - new way of file paths block:///dev/blah - new way, block devs gluster://server/vol/img - new way, remote images ... Possible caveats: RBD for example allows to use multiple hosts and we'd need to introduce a possibility to specify it if we'd add support for this on rbd. 2) Export the image chain in the XML and allow to use indexed disk names This option would require to export the backing chain in the XML in some way, either the existing disk source specification in multiple elements (which I don't like as it is a bit convoluted), or possibly again via URIs. Then the user would be allowed to specify vda[2] for the second backing image of the vda disk. With this the internal representations of the backing chain would be used without the need for the user to specify path. A possible caveat here is that if backing chains for some reason will be converted to backing trees, this approach will be invalid. 3) ? anyone suggesting something better? :) Thanks in advance for suggestions and/or new ideas. Peter

On Thu, Feb 20, 2014 at 02:46:03PM +0100, Peter Krempa wrote:
Hi,
with my recent work into snapshots with native libgfapi support I've run into an issue with libvirt APIs used to delete snapshots by management apps.
The management apps use the APIs to do the job: int virDomainBlockCommit(virDomainPtr dom, const char *disk, const char *base, const char *top, unsigned long bandwidth, unsigned int flags)
int virDomainBlockPull(virDomainPtr dom, const char *disk, unsigned long bandwidth, unsigned int flags)
int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags)
As you can see in the prototypes of these functions (and from the docs for them which I'm not going to copy here) the user can provide the disk specification in two possible options:
1) a full path if it's unambiguous - this is file centric and requires the file to be in the local filesystem
2) a disk name "vda" - this is selected automatically by libvirt but allows to specify only the top image.
For systems that want to use remote storage without local representation such as gluster+libgfapi, this doesn't allow to use the APIs to start block jobs.
To solve this issue we need a way to specify paths on remote storage in some way. Below are two options we've discussed on IRC.
1) Use URIs along with the file path to specify disk images.
This option would add a new, possibly well documented URIs to specify paths for disk images. These would be libvirt defined URIs (but surprisingly "similar" to qemu URIS) so that hypervisors with different storage specification would need a conversion.
This would allow to specify the targets as: vda - disk name /path/to/file - legacy way, path file:///path/to/file - new way of file paths block:///dev/blah - new way, block devs gluster://server/vol/img - new way, remote images ...
And we can add our specific thing in there without (hopefully) breaking anything (e.g. new schemas, parameters).
Possible caveats: RBD for example allows to use multiple hosts and we'd need to introduce a possibility to specify it if we'd add support for this on rbd.
This could be example no. 1 how we can make workaround something using our URIs, for example: "libvirt-rbd://main-host/path-or-volume?additional_hosts[]=host2&additional_hosts[]=host3" even though this looks *very* ugly... NB: if qemu will support this, we'll have to pass them a string anyway, won't we? So they'd make the job for us.
2) Export the image chain in the XML and allow to use indexed disk names This option would require to export the backing chain in the XML in some way, either the existing disk source specification in multiple elements (which I don't like as it is a bit convoluted), or possibly again via URIs.
Then the user would be allowed to specify vda[2] for the second backing image of the vda disk.
I thought about something similar, specifying "vda~1" "vda~3" would be unique and still simple enough to use. I, however, don't understand why would we need to export the backing chain in the XML. If the management app wants to do a rebase, it must know from-to relative to the top image anyway. Or did I miss something?
With this the internal representations of the backing chain would be used without the need for the user to specify path.
A possible caveat here is that if backing chains for some reason will be converted to backing trees, this approach will be invalid.
We'd have this issue with URIs as well, but backing trees don't make sense with current qemu implementation anyway, if I understand that correctly.
3) ? anyone suggesting something better? :)
Thanks in advance for suggestions and/or new ideas.
You're welcome.
Peter
Martin

On Thu, Feb 20, 2014 at 02:46:03PM +0100, Peter Krempa wrote:
As you can see in the prototypes of these functions (and from the docs for them which I'm not going to copy here) the user can provide the disk specification in two possible options:
1) a full path if it's unambiguous - this is file centric and requires the file to be in the local filesystem
2) a disk name "vda" - this is selected automatically by libvirt but allows to specify only the top image.
For systems that want to use remote storage without local representation such as gluster+libgfapi, this doesn't allow to use the APIs to start block jobs.
To solve this issue we need a way to specify paths on remote storage in some way. Below are two options we've discussed on IRC.
1) Use URIs along with the file path to specify disk images.
This option would add a new, possibly well documented URIs to specify paths for disk images. These would be libvirt defined URIs (but surprisingly "similar" to qemu URIS) so that hypervisors with different storage specification would need a conversion.
This would allow to specify the targets as: vda - disk name /path/to/file - legacy way, path file:///path/to/file - new way of file paths block:///dev/blah - new way, block devs gluster://server/vol/img - new way, remote images ...
Possible caveats: RBD for example allows to use multiple hosts and we'd need to introduce a possibility to specify it if we'd add support for this on rbd.
Superficially this is appealing, however, what I dislike about it is that it has no correspondance to the way you specify the targets in the XML file. So apps would need to know two different formats for dealing with network disks. The RBD multiple hosts issues is another concern.
2) Export the image chain in the XML and allow to use indexed disk names This option would require to export the backing chain in the XML in some way, either the existing disk source specification in multiple elements (which I don't like as it is a bit convoluted), or possibly again via URIs.
Then the user would be allowed to specify vda[2] for the second backing image of the vda disk.
With this the internal representations of the backing chain would be used without the need for the user to specify path.
To me this is more appealing because of its simplicity. I think I would rather like us to expose the backing store info explicitly in the XML if we go this route, so that the index values are explicitly visible to apps using the XML.
A possible caveat here is that if backing chains for some reason will be converted to backing trees, this approach will be invalid.
One possibility if we ended up with forking trees would be to have multiple indexes vda[1][2]. This gets a little more tedious to deal with though. Even if we have a tree though, the backing files will have to be exposed in some order in the XML file. eg via a depth first sort. Once we expose the files in the XML in this way, we do still have a single index value we can use - the value resulting from the depth first sort.
3) ? anyone suggesting something better? :)
Thanks in advance for suggestions and/or new ideas.
The only other option I'd have would be to actually use the XML snippet of the <source> element. Again this would require us to expose XML for all backing files. I think this would be a bit more ugly and not so user friendly compared to 2). Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 02/20/2014 07:37 AM, Daniel P. Berrange wrote:
2) Export the image chain in the XML and allow to use indexed disk names This option would require to export the backing chain in the XML in some way, either the existing disk source specification in multiple elements (which I don't like as it is a bit convoluted), or possibly again via URIs.
Then the user would be allowed to specify vda[2] for the second backing image of the vda disk.
With this the internal representations of the backing chain would be used without the need for the user to specify path.
To me this is more appealing because of its simplicity. I think I would rather like us to expose the backing store info explicitly in the XML if we go this route, so that the index values are explicitly visible to apps using the XML.
As it is, I'd like to have the backing chain listed in XML for other reasons - I'm losing track of how many times people have complained that 'virsh blockpull' isn't working, only to discover that they forgot to set -obacking_fmt=qcow2 in their qemu-img calls that created their backing chain, so libvirt was treating the backing file as raw instead of as qcow2 for security reasons, and thus treating the chain as shorter than what qemu wants to do. But without an obvious way to export what libvirt thinks is the backing chain, it's harder to point this error out to end users. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 02/20/14 15:37, Daniel P. Berrange wrote:
On Thu, Feb 20, 2014 at 02:46:03PM +0100, Peter Krempa wrote:
As you can see in the prototypes of these functions (and from the docs for them which I'm not going to copy here) the user can provide the disk specification in two possible options:
1) a full path if it's unambiguous - this is file centric and requires the file to be in the local filesystem
2) a disk name "vda" - this is selected automatically by libvirt but allows to specify only the top image.
For systems that want to use remote storage without local representation such as gluster+libgfapi, this doesn't allow to use the APIs to start block jobs.
To solve this issue we need a way to specify paths on remote storage in some way. Below are two options we've discussed on IRC.
1) Use URIs along with the file path to specify disk images.
This option would add a new, possibly well documented URIs to specify paths for disk images. These would be libvirt defined URIs (but surprisingly "similar" to qemu URIS) so that hypervisors with different storage specification would need a conversion.
This would allow to specify the targets as: vda - disk name /path/to/file - legacy way, path file:///path/to/file - new way of file paths block:///dev/blah - new way, block devs gluster://server/vol/img - new way, remote images ...
Possible caveats: RBD for example allows to use multiple hosts and we'd need to introduce a possibility to specify it if we'd add support for this on rbd.
Superficially this is appealing, however, what I dislike about it is that it has no correspondance to the way you specify the targets in the XML file. So apps would need to know two different formats for dealing with network disks. The RBD multiple hosts issues is another concern.
2) Export the image chain in the XML and allow to use indexed disk names This option would require to export the backing chain in the XML in some way, either the existing disk source specification in multiple elements (which I don't like as it is a bit convoluted), or possibly again via URIs.
Then the user would be allowed to specify vda[2] for the second backing image of the vda disk.
With this the internal representations of the backing chain would be used without the need for the user to specify path.
To me this is more appealing because of its simplicity. I think I would rather like us to expose the backing store info explicitly in the XML if we go this route, so that the index values are explicitly visible to apps using the XML.
Ok, for the block chain operations this seems to be the most viable idea here. One further thing we should discuss is the block copy job, where we need to specify a new path that is not part of the backing chain of the disk where the disk gets copied (and efectively becomes the new single element of the backing chain). The for this operation has a very similar interface which we need to figure out too sooner or later. Peter

On 02/24/2014 08:48 AM, Peter Krempa wrote:
One further thing we should discuss is the block copy job, where we need to specify a new path that is not part of the backing chain of the disk where the disk gets copied (and efectively becomes the new single element of the backing chain). The for this operation has a very similar interface which we need to figure out too sooner or later.
For that interface, I wonder if the best approach is to add a new flag. By default, when the flag is 0, the new disk string is treated as a path name in the local file system. But when the flag is set, the new disk string is treated as an XML document describing the full <disk> details, which gives us the full flexibility for a volume within a storage pool or the full details of a network device such as gluster, or even a network device that has multiple <host> subelements. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 02/24/2014 09:53 AM, Eric Blake wrote:
On 02/24/2014 08:48 AM, Peter Krempa wrote:
One further thing we should discuss is the block copy job, where we need to specify a new path that is not part of the backing chain of the disk where the disk gets copied (and efectively becomes the new single element of the backing chain). The for this operation has a very similar interface which we need to figure out too sooner or later.
For that interface, I wonder if the best approach is to add a new flag. By default, when the flag is 0, the new disk string is treated as a path name in the local file system. But when the flag is set, the new disk string is treated as an XML document describing the full <disk> details, which gives us the full flexibility for a volume within a storage pool or the full details of a network device such as gluster, or even a network device that has multiple <host> subelements.
[I hit send too soon] That is, the shorthand of "vda[1]" or "vda[2]" for referring to elements already in the existing block chain works nicely for blockpull and blockcommit; and for blockcopy, reusing existing XML notations for specifying a network destination, the same way we just recently taught snapshots to reuse XML notations, seems like the best way for designating the new location. And since we were smart enough to have a flag argument, I'm fine with using the flag argument for the determination of whether a file string is a local filename vs. an XML <disk> designation. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Feb 24, 2014 at 09:53:59AM -0700, Eric Blake wrote:
On 02/24/2014 08:48 AM, Peter Krempa wrote:
One further thing we should discuss is the block copy job, where we need to specify a new path that is not part of the backing chain of the disk where the disk gets copied (and efectively becomes the new single element of the backing chain). The for this operation has a very similar interface which we need to figure out too sooner or later.
For that interface, I wonder if the best approach is to add a new flag. By default, when the flag is 0, the new disk string is treated as a path name in the local file system. But when the flag is set, the new disk string is treated as an XML document describing the full <disk> details, which gives us the full flexibility for a volume within a storage pool or the full details of a network device such as gluster, or even a network device that has multiple <host> subelements.
If we want to allow XML instead of a path, then I'd suggest we really should create a new API instead of overloading the semantics of 'path'. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Martin Kletzander
-
Peter Krempa