Re: [libvirt] [PATCHv4 05/18] blockjob: add new API flags

13 Apr 2012

      On 04/13/2012 03:38 AM, Jiri Denemark wrote:
...
On Mon, Apr 09, 2012 at 21:52:14 -0600, Eric Blake wrote:
...
This patch introduces a new block job, useful for live storage
migration using pre-copy streaming.
Using a live VM with the backing chain:
  base <- snap1 <- snap2
as the starting point, we have:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
    VIR_DOMAIN_BLOCK_REBASE_COPY)
creates /path/to/copy with the same format as snap2, with no backing
file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
    VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW)
creates /path/to/copy as a raw file, so entire chain is copied and
flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
    VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW)
creates /path/to/copy with the same format as snap2, but with snap1 as
a backing file, so only snap2 is copied.
In other words, this doesn't do any rebase at all, it just copies snap2 into
copy and the result is the following chain:
    base <- snap1 <- copy
Did I get it right?
Yes, you got the correct end result.  I'll restate things in a slightly
different manner, to explain even more why I like the naming:

virDomainBlockRebase(,0) sets up a one-shot job that will eventually
rebase the current image to have a different backing file; when the job
completes (automatically), the rebase is complete (that is, the qemu
process now has a different backing chain in memory).  It is a property
of 'block_stream' that the job can be aborted and restarted at will (if
you abort at 20% pulled, then the next pull will start at 20%).

virDomainBlockRebase(,_REBASE_COPY) sets up a long-running job that
never completes without user intervention.  When the user completes the
job by use of virDomainBlockJobAbort(,_ABORT_PIVOT), then the rebase is
complete (that is, the qemu process now has a different backing chain in
memory).  The job can be aborted at will, but if you abort before the
streaming phase is complete, you have to start from scratch on the next
attempt (that is, if you abort 'drive-mirror' at 20%, you  have to
restart it at 0%).  More interesting is the case where you abort after
streaming is complete and you are now in the mirroring phase - in that
case, the copy is guaranteed to be consistent with the state of the
source at the time you reverted to the source.

But since both operations (with or without the flag) set up a block job
that starts a rebase, and it is the completion of the block job (whether
automatically or by user intervention) that finally commits qemu to
using the alternate backing chain in memory, both operations fit under
the generic name of causing a BlockRebase.
...
...
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0,
    VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)
reuse existing /path/to/copy (must have empty contents, and format is
probed from the metadata), and copy the full chain
Hmm, in the past we tried to avoid format probing for security reasons.
Shouldn't we avoid introducing new code that needs format probing? However,
I'm not sure that's doable at all in this case.
I definitely thought about that.  The security hold can be summarized as:

If you give the guest a raw image, but do not explicitly mark it as raw,
then the guest can trivially (as in, without having to crack qemu)
modify the header of the image in such a manner as to make a future
probe detect something other than raw.  For all other image types, there
is no way for a guest to fake the wrong image type (at least, not
without also cracking qemu, in which case you've got worse problems on
your hand), since the guest cannot corrupt the metadata that a probe is
reading.

Therefore, there are two ways to defeat this hole, and using either
method works in isolation to avoid the hole:

Mitigation #1: Never probe a raw image; any other type of image can be
safely probed once you have eliminated raw images.
Mitigation #2: Be explicit about the image format for all images.

You are asking for mitigation 2 (and see patch 16/18 where I give that
to you in a new API virDomainBlockCopy).  But I have already given you
mitigation 1: if you specify VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, then you
are avoiding the probe by declaring the image to be raw, and for all
other image types, the probe is no longer a security hole.
...
...
Management applications can pre-create the copy with a relative
backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT
flag to have qemu reuse the metadata; if the management application
also copies the backing files to a new location, this can be used
to perform live storage migration of an entire backing chain.
This all sounds quite complicated but you explained it well in the
documentation block for virDomainBlockRebase. When reading the above, I
thought an example showing all steps needed to copy a disk image would be
useful but reading virDomainBlockRebase documentation clarified it enough that
I don't think the example is required :-)
I still ought to add another patch somewhere in the html documentation
that gives a good summary of it all, especially since it is a
multi-API-call sequence.  But that can be a later patch.
...
This all looks good and I think sticking this functionality into
virDomainBlockRebase makes sense. Since the qemu interface is not set in stone
yet, I won't formally ack any of these block copy patches :-)
Fair enough.  I need to rebase and post v5 anyways, so I'll apply tweaks
from your v4 review at that time.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org