[libvirt] Notes: Non-shared storage live migration w/ active blockcommit

This notes is based on an IRC conversation with Eric Blake, to have efficient non-shared storage live migration. Thought I'd post my notes here before I forget. Please review and spot if there are any inaccuracies. Procedure --------- (1) Starting from disk A, create a snapshot A <- A': $ virsh snapshot-create-as \ --domain f20vm snap1 snap1-desc \ --diskspec hda,file=/export/vmimages/A'.qcow2 \ --disk-only --atomic (2) Background copy of A to B: $ virsh blockcopy \ --domain vm1 vda /export/vmimages/B.qcow2 \ --wait --verbose --shallow \ --finish (3) Create an empty B' with backing file B: $ qemu-img create -f qcow2 -b B.qcow2 \ -o backing_fmt=qcow2 B'.qcow2 [or] $ virsh vol-create-as default B'.qcow2 1G \ --format qcow2 \ --backing-vol B.qcow2 --backing-vol-format qcow2 (4) Do a shallow blockcopy of A' to B': $ virsh blockcopy \ --domain vm1 vda /export/vmimages/B'.qcow2 \ --wait --verbose --shallow \ --finish (5) Then live shallow commit of B: $ virsh blockcommit \ --domain f20vm vda \ --wait --verbose --shallow \ --pivot --active --finish Block Commit: [100 %] Successfully pivoted -- /kashyap

On Thu, Sep 25, 2014 at 07:56:09PM +0530, Kashyap Chamarthy wrote:
This notes is based on an IRC conversation with Eric Blake, to have efficient non-shared storage live migration. Thought I'd post my notes here before I forget. Please review and spot if there are any inaccuracies.
Noting a couple of things I missed. . .
Procedure ---------
(1) Starting from disk A, create a snapshot A <- A':
$ virsh snapshot-create-as \ --domain f20vm snap1 snap1-desc \ --diskspec hda,file=/export/vmimages/A'.qcow2 \ --disk-only --atomic
Before performing a live blockcopy, make the domain transient (as persistent dirty bitmap support is yet to arrive in QEMU). Take backup of the guest XML: $ virsh dumpxml f20vm > /var/tmp/f20vm.xml Undefine the running guest, thus turning it to a transient guest: $ virsh undefine f20vm
(2) Background copy of A to B:
$ virsh blockcopy \ --domain vm1 vda /export/vmimages/B.qcow2 \ --wait --verbose --shallow \ --finish
(3) Create an empty B' with backing file B:
$ qemu-img create -f qcow2 -b B.qcow2 \ -o backing_fmt=qcow2 B'.qcow2
[or]
$ virsh vol-create-as default B'.qcow2 1G \ --format qcow2 \ --backing-vol B.qcow2 --backing-vol-format qcow2
(4) Do a shallow blockcopy of A' to B':
$ virsh blockcopy \ --domain vm1 vda /export/vmimages/B'.qcow2 \ --wait --verbose --shallow \ --finish
Since the a chain was already created in step (3), I should have used '--reuse-external' flag at this shallow blockcopy.
(5) Then live shallow commit of B:
$ virsh blockcommit \ --domain f20vm vda \ --wait --verbose --shallow \ --pivot --active --finish Block Commit: [100 %] Successfully pivoted
-- /kashyap

On Thu, Sep 25, 2014 at 3:26 PM, Kashyap Chamarthy <kchamart@redhat.com> wrote:
This notes is based on an IRC conversation with Eric Blake, to have efficient non-shared storage live migration. Thought I'd post my notes here before I forget. Please review and spot if there are any inaccuracies.
What are you trying to achieve? Stefan

On Mon, Sep 29, 2014 at 10:33:37AM +0100, Stefan Hajnoczi wrote:
On Thu, Sep 25, 2014 at 3:26 PM, Kashyap Chamarthy <kchamart@redhat.com> wrote:
This notes is based on an IRC conversation with Eric Blake, to have efficient non-shared storage live migration. Thought I'd post my notes here before I forget. Please review and spot if there are any inaccuracies.
What are you trying to achieve?
Hmm, I clearly failed to articulate what I was trying to do then. Allow me to rephrase. So, this was just an attempt out of curiosity derived from an IRC conversation with Eric Blake on #virt (OFTC) some months ago -- once libvirt starts using QEMU 2.0's (block) commit of active layer of a disk image chain into its immediate base or one of the intermediate images, live migration with non-shared storage should be more efficient (with a combination of blockcopy+blockcommit). Now that all the relevant bits are in place for awhile, So, I was just trying to test that works out. - Relevant git commit from libvirt: 318cfabdb442f41c9a7016889526c67aad26a503 (blockcommit: document semantics of committing active layer) - Relevant git commit from QEMU: 20a63d2cec838c2dde4d246c4d7abe747d9b7a11 (commit: Support commit active layer) Hope I explain it a bit more clear. More context: I intend to a do user/admin style talk at the upcoming LinuxCon/CloudOpen in Düsseldorf, so was just testing and experimenting to see what kind of practical steps can virt/cloud software (like OpenStack) users/adminstrators can take advantage of the newer features in libvirt/QEMU when dealing with snapshots and large disk image chains/merge. To keep things less complicated, I'm mainly tinkering with file-based snapshots. https://kashyapc.fedorapeople.org/virt/lcce-2014/Abstract-CloudOpen-Eu-2014.... -- /kashyap

On 09/25/2014 08:26 AM, Kashyap Chamarthy wrote:
This notes is based on an IRC conversation with Eric Blake, to have efficient non-shared storage live migration. Thought I'd post my notes here before I forget. Please review and spot if there are any inaccuracies.
Procedure ---------
(1) Starting from disk A, create a snapshot A <- A':
$ virsh snapshot-create-as \ --domain f20vm snap1 snap1-desc \ --diskspec hda,file=/export/vmimages/A'.qcow2 \ --disk-only --atomic
If you are using this snapshot only for the side-effect of growing the chain, you can add --no-metadata here instead of deleting the snapshot later when it gets invalidated [1]. Of course, if you pass --no-metadata, the snapshot name (snap1) and description (snap1-desc) are no longer important.
(2) Background copy of A to B:
$ virsh blockcopy \ --domain vm1 vda /export/vmimages/B.qcow2 \ --wait --verbose --shallow \ --finish
This step is not quite right. You are asking for a shallow copy of the current file for disk 'vda' (that is, A'.qcow2). But that is NOT the same as the base A image. For this step, libvirt does not yet have an easy way to access the contents of a backing chain of a live domain; you CAN use 'virsh vol-*' commands to do a background copy from storage pools, but it may be easier to just resort to normal file system tools: cp /export/vmimages/A.qcow2 /export/vmimages/B.qcow2 or even rely on storage-array-specific commands to set up a trivial clone with no real time overhead (for example, some iscsi storage arrays allow efficient copy-on-write cloning of storage volumes by creating a new name that shares the same original contents of A.qcow2 as its starting point; and since we are about to delete A.qcow2 later on, we never need any actual data copying).
(3) Create an empty B' with backing file B:
$ qemu-img create -f qcow2 -b B.qcow2 \ -o backing_fmt=qcow2 B'.qcow2
[or]
$ virsh vol-create-as default B'.qcow2 1G \ --format qcow2 \ --backing-vol B.qcow2 --backing-vol-format qcow2
[side note - we should really teach libvirt to not REQUIRE a size when creating an empty wrapper around an existing image]
(4) Do a shallow blockcopy of A' to B':
$ virsh blockcopy \ --domain vm1 vda /export/vmimages/B'.qcow2 \ --wait --verbose --shallow \ --finish
For this to work, you need to also use the --reuse-external flag to take advantage of the backing chain already recorded in B'.qcow2 (without the flag, the command will complain that B'.qcow2 already exists if it is a regular file; if it is a block device, it will just silently ignore the contents of the block device and treat B'.qcow2 as though an absolute path to A.qcow2 were its backing file).
(5) Then live shallow commit of B:
$ virsh blockcommit \ --domain f20vm vda \ --wait --verbose --shallow \ --pivot --active --finish Block Commit: [100 %] Successfully pivoted
With steps 2 and 4 corrected, this indeed shortens the chain back down to just B.qcow2. And once this happens, you no longer need the path to A.qcow2 or A'.qcow2; you can also delete B'.qcow2. But back to the point I made earlier at [1]: if this is all you do, then 'virsh snapshot-list' will still show 'snap1' as a snapshot that tries to refer to A'.qcow2; since you just invalidated that with the copy, you'd need to 'virsh snapshot-delete --metadata vm1 snap1' to get rid of the stale snapshot (if you don't tweak step 1 to avoid creating that snapshot metadata in the first place). The NICE part about this whole sequence is that the backing file does NOT have to be qcow2, and it is VERY efficient timewise, if you happen to have an efficient way to do step 2. That is, I can go from a multi-gigabyte raw file A.img to raw file B.img in less than a second, assuming the guest isn't doing much I/O in the meantime, when scripting all these steps together, and without any guest downtime. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, Oct 07, 2014 at 05:35:00PM -0600, Eric Blake wrote:
On 09/25/2014 08:26 AM, Kashyap Chamarthy wrote:
This notes is based on an IRC conversation with Eric Blake, to have efficient non-shared storage live migration. Thought I'd post my notes here before I forget. Please review and spot if there are any inaccuracies.
Procedure ---------
(1) Starting from disk A, create a snapshot A <- A':
$ virsh snapshot-create-as \ --domain f20vm snap1 snap1-desc \ --diskspec hda,file=/export/vmimages/A'.qcow2 \ --disk-only --atomic
If you are using this snapshot only for the side-effect of growing the chain, you can add --no-metadata here instead of deleting the snapshot later when it gets invalidated [1]. Of course, if you pass --no-metadata, the snapshot name (snap1) and description (snap1-desc) are no longer important.
Right, until proper cleaner revert to external snapshot mechanisms are in place, I should make a habit of passing '--no-metadata' when creating external snapshots for the above reason (as I usually do end up deleting the related libvirt metadata as part of cleanup).
(2) Background copy of A to B:
$ virsh blockcopy \ --domain vm1 vda /export/vmimages/B.qcow2 \ --wait --verbose --shallow \ --finish
This step is not quite right. You are asking for a shallow copy of the current file for disk 'vda' (that is, A'.qcow2). But that is NOT the same as the base A image.
Oh right, thanks for catching this mistake.
For this step, libvirt does not yet have an easy way to access the contents of a backing chain of a live domain; you CAN use 'virsh vol-*' commands to do a background copy from storage pools, but it may be easier to just resort to normal file system tools:
cp /export/vmimages/A.qcow2 /export/vmimages/B.qcow2
Yeah, simple and less commands to type too.
or even rely on storage-array-specific commands to set up a trivial clone with no real time overhead (for example, some iscsi storage arrays allow efficient copy-on-write cloning of storage volumes by creating a new name that shares the same original contents of A.qcow2 as its starting point; and since we are about to delete A.qcow2 later on, we never need any actual data copying).
(3) Create an empty B' with backing file B:
$ qemu-img create -f qcow2 -b B.qcow2 \ -o backing_fmt=qcow2 B'.qcow2
[or]
$ virsh vol-create-as default B'.qcow2 1G \ --format qcow2 \ --backing-vol B.qcow2 --backing-vol-format qcow2
[side note - we should really teach libvirt to not REQUIRE a size when creating an empty wrapper around an existing image]
Filed: https://bugzilla.redhat.com/show_bug.cgi?id=1150411
(4) Do a shallow blockcopy of A' to B':
$ virsh blockcopy \ --domain vm1 vda /export/vmimages/B'.qcow2 \ --wait --verbose --shallow \ --finish
For this to work, you need to also use the --reuse-external flag
True, I self-corrected in my other response in this thread, but thanks for noticing.
to take advantage of the backing chain already recorded in B'.qcow2 (without the flag, the command will complain that B'.qcow2 already exists if it is a regular file; if it is a block device, it will just silently ignore the contents of the block device and treat B'.qcow2 as though an absolute path to A.qcow2 were its backing file).
(5) Then live shallow commit of B:
$ virsh blockcommit \ --domain f20vm vda \ --wait --verbose --shallow \ --pivot --active --finish Block Commit: [100 %] Successfully pivoted
With steps 2 and 4 corrected, this indeed shortens the chain back down to just B.qcow2. And once this happens, you no longer need the path to A.qcow2 or A'.qcow2; you can also delete B'.qcow2. But back to the point I made earlier at [1]: if this is all you do, then 'virsh snapshot-list' will still show 'snap1' as a snapshot that tries to refer to A'.qcow2; since you just invalidated that with the copy, you'd need to 'virsh snapshot-delete --metadata vm1 snap1' to get rid of the stale snapshot (if you don't tweak step 1 to avoid creating that snapshot metadata in the first place).
Thanks for this reminder, I'll script this as part of my tests to ensure it's not missed.
The NICE part about this whole sequence is that the backing file does NOT have to be qcow2, and it is VERY efficient timewise, if you happen to have an efficient way to do step 2. That is, I can go from a multi-gigabyte raw file A.img to raw file B.img in less than a second, assuming the guest isn't doing much I/O in the meantime, when scripting all these steps together, and without any guest downtime.
Thanks again, for your meticulous review. -- /kashyap
participants (3)
-
Eric Blake
-
Kashyap Chamarthy
-
Stefan Hajnoczi