On 07.09.2010, at 15:41, Anthony Liguori wrote:
Hi,
We've got copy-on-read and image streaming working in QED and before going much
further, I wanted to bounce some interfaces off of the libvirt folks to make sure our
final interface makes sense.
Here's the basic idea:
Today, you can create images based on base images that are copy on write. With QED, we
also support copy on read which forces a copy from the backing image on read requests and
write requests.
In additional to copy on read, we introduce a notion of streaming a block device which
means that we search for an unallocated region of the leaf image and force a copy-on-read
operation.
The combination of copy-on-read and streaming means that you can start a guest based on
slow storage (like over the network) and bring in blocks on demand while also having a
deterministic mechanism to complete the transfer.
The interface for copy-on-read is just an option within qemu-img create. Streaming, on
the other hand, requires a bit more thought. Today, I have a monitor command that does
the following:
stream <device> <sector offset>
Which will try to stream the minimal amount of data for a single I/O operation and then
return how many sectors were successfully streamed.
The idea about how to drive this interface is a loop like:
offset = 0;
while offset < image_size:
wait_for_idle_time()
count = stream(device, offset)
offset += count
Obviously, the "wait_for_idle_time()" requires wide system awareness. The
thing I'm not sure about is 1) would libvirt want to expose a similar stream interface
and let management software determine idle time 2) attempt to detect idle time on it's
own and provide a higher level interface. If (2), the question then becomes whether we
should try to do this within qemu and provide libvirt a higher level interface.
I'm torn here too. Why not expose both? Have a qemu internal daemon available that
gets a sleep time as parameter and an external "pull sectors" command. We'll
see which one is more useful, but I don't think it's too much code to justify only
having one of the two. And the internal daemon could be started using a command line
parameter, which helps non-managed users.
A related topic is block migration. Today we support pre-copy migration which means we
transfer the block device and then do a live migration. Another approach is to do a live
migration, and on the source, run a block server using image streaming on the destination
to move the device.
With QED, to implement this one would:
1) launch qemu-nbd on the source while the guest is running
2) create a qed file on the destination with copy-on-read enabled and a backing file
using nbd: to point to the source qemu-nbd
3) run qemu -incoming on the destination with the qed file
4) execute the migration
5) when migration completes, begin streaming on the destination to complete the copy
6) when the streaming is complete, shut down the qemu-nbd instance on the source
This is a bit involved and we could potentially automate some of this in qemu by
launching qemu-nbd and providing commands to do some of this. Again though, I think the
question is what type of interfaces would libvirt prefer? Low level interfaces + recipes
on how to do high level things or higher level interfaces?
Is there anything keeping us from making the QMP socket multiplexable? I was thinking of
something like:
{ command = "nbd_server" ; block = "qemu_block_name" }
{ result = "done" }
<qmp socket turns into nbd socket>
This way we don't require yet another port, don't have to care about conflicts and
get internal qemu block names for free.
Alex