On Wed, Apr 15, 2009 at 03:28:54PM -0400, Cole Robinson wrote:
Hi all,
Attached is a stab at an API for cloning a storage volume. This patch
implements the command for FS volumes, along with a virsh command
'vol-clone'. This builds on the previously posted 'drop pool lock during
volume creation' work.
The new API call looks like:
/**
* virStorageVolClone:
* @vol: pointer to storage vol to clone
* @xmldesc: description of volume to create
* @flags: flags for creation (unused, pass 0)
*
* Clone a storage volume within the parent pool.
* Information for the new volume (name, perms)
* are passed via a typical volume XML description
* Not all pools support cloning of volumes
*
* return the storage volume, or NULL on error
*/
virStorageVolPtr
virStorageVolClone(virStorageVolPtr vol,
const char *xmldesc,
unsigned int flags)
One potentially annoying restriction of this API is that it would not
allow for cloning between storage pools. eg clone an LVM volume to
a qcow2 file.
While I don't think we need to actually implement that kind of cross
pool cloning right now, I think it is worth allowing for it in the
API contract.
Thus I'd suggest taking the virStorageVolCreateXML() API as is and
then adding an extra 'clone' parameter
virStorageVolPtr virStorageVolCreateXMLFrom(virStoragePoolPtr pool,
const char *xmldesc,
unsigned int flags,
virStorageVolPtr clone);
One could argue we don't need this at all, and we should just add a 'copy'
API
int virStorageVolCopyData(virStorageVolPtr vol,
virStorageVolPtr from);
but being able to create + initialize data in one go might well allow for
some optimizations with some types of storage.
The user passes information about the new volume via <volume> XML.
Currently we only use the passed name and permissions values, but
backingStore info could also be relevant, and 'format' if we allow
cloning between image formats.
Current limitations:
- Parsing the 'clone' xml will complain if the 'capacity' element
hasn't
been specified, even though it is meaningless for the new volume. Not
too sure what the right thing to do here is.
I think it depends how you define the semantics of the clone operation.
My feeling is that this should follow the same rules as the normal
create call in all aspect, and that the 'clone' volume is just there
as a data source.
There's no reason the new volume has to be forced to be the same
capacity as the one being cloned. If its bigger that just means
there's some uninitialized data at the end which is perfectly OK.
The guest admin can extend the partition table / filesystem as
desired. If smaller you could just truncate copied data at that
point, though its more quesitonable whether that's useful.
- A clone of a raw sparse file will turn out fully allocated. Could
probably borrow code from 'cp' to try and get this right.
It is pretty easy to do a good approximation of sparseness. You're
copying in 1 MB chunks. So before write()ing the 1 MB out to the
target, just do a scan to see if its all zeros. If it is, then
do a lseek() instead. This is what virt-clone currently does.
- Doesn't handle backing stores. I think we would need to call
out to
'qemu-img convert' to have any chance of getting this right. And if we
do that, we could also allow cloning from one image format to another.
For non-raw formats you want to call out to qemu-img anyway because
you need to be able to have a larger target image. This will require
differnet metadata being written in the qcow/vmdk/etc headers, so a
plain byte-wise clone won't be sufficient.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|