Re: [libvirt] [RFC] Storage volume clone API

22 Apr 2009


      On 04/16/2009 11:10 AM, Daniel P. Berrange wrote:
...
On Wed, Apr 15, 2009 at 03:28:54PM -0400, Cole Robinson wrote:
...
Hi all,
Attached is a stab at an API for cloning a storage volume. This patch
implements the command for FS volumes, along with a virsh command
'vol-clone'. This builds on the previously posted 'drop pool lock during
volume creation' work.
The new API call looks like:
/**
 * virStorageVolClone:
 * @vol: pointer to storage vol to clone
 * @xmldesc: description of volume to create
 * @flags: flags for creation (unused, pass 0)
 *
 * Clone a storage volume within the parent pool.
 * Information for the new volume (name, perms)
 * are passed via a typical volume XML description
 * Not all pools support cloning of volumes
 *
 * return the storage volume, or NULL on error
 */
virStorageVolPtr
virStorageVolClone(virStorageVolPtr vol,
                   const char *xmldesc,
                   unsigned int flags)
One potentially annoying restriction of this API is that it would not
allow for cloning between storage pools. eg clone an LVM volume to
a qcow2 file.
While I don't think we need to actually implement that kind of cross
pool cloning right now, I think it is worth allowing for it in the
API contract.
Thus I'd suggest taking the virStorageVolCreateXML() API as is and
then adding an extra 'clone' parameter
virStorageVolPtr        virStorageVolCreateXMLFrom(virStoragePoolPtr pool,
                                                   const char *xmldesc,
                                                   unsigned int flags,
                                                   virStorageVolPtr clone);
Ah sorry, I guess I didn't really consider the case of cloning between
storage pools. I think the above prototype looks good.
...
One could argue we don't need this at all, and we should just add a 'copy'
API
int virStorageVolCopyData(virStorageVolPtr vol,
                            virStorageVolPtr from);
but being able to create + initialize data in one go might well allow for
some optimizations with some types of storage.
Hmm, this is interesting. The above ties into something I've been
wondering about: an API to import to a volume from an arbitrary path.
This would allow virt-manager users an option to, say, move install
media from their homedir to a storage pool. But that's a separate
discussion.
...
...
The user passes information about the new volume via <volume> XML.
Currently we only use the passed name and permissions values, but
backingStore info could also be relevant, and 'format' if we allow
cloning between image formats.
Current limitations:
- Parsing the 'clone' xml will complain if the 'capacity' element hasn't
been specified, even though it is meaningless for the new volume. Not
too sure what the right thing to do here is.
I think it depends how you define the semantics of the clone operation.
My feeling is that this should follow the same rules as the normal
create call in all aspect, and that the 'clone' volume is just there
as a data source.
There's no reason the new volume has to be forced to be the same
capacity as the one being cloned. If its bigger that just means
there's some uninitialized data at the end which is perfectly OK.
The guest admin can extend the partition table / filesystem as
desired.  If smaller you could just truncate copied data at that
point, though its more quesitonable whether that's useful.
Sounds good.
...
...
- A clone of a raw sparse file will turn out fully allocated. Could
probably borrow code from 'cp' to try and get this right.
It is pretty easy to do a good approximation of sparseness. You're
copying in 1 MB chunks. So before write()ing the 1 MB out to the
target, just do a scan to see if its all zeros. If it is, then
do a lseek()  instead. This is what virt-clone currently does.
I'll give that a shot.
...
...
- Doesn't handle backing stores. I think we would need to call out to
'qemu-img convert' to have any chance of getting this right. And if we
do that, we could also allow cloning from one image format to another.
For non-raw formats you want to call out to qemu-img anyway because
you need to be able to have a larger target image. This will require
differnet metadata being written in the qcow/vmdk/etc headers, so a
plain byte-wise clone won't be sufficient.
Right, with the ability to specify a different capacity for the new
volume, using qemu-img is a must. I'll incorporate that into my next patch.

Thanks,
Cole