
On Thu, Jul 03, 2014 at 11:08:15AM -0600, Eric Blake wrote:
On 07/02/2014 01:12 PM, Kashyap Chamarthy wrote:
We have this simple chain:
base <- snap1
Let's quickly examine the contents of 'base' and 'snap1' images:
Now, let's do a live blockcopy (with a '--finish' to graecully finish the mirroring):
$ virsh blockcopy --domain testvm2 vda \ /export/dst/copy.qcow2 \ --wait --verbose --finish
This defaults to a full copy (copy.qcow2 will contain everything in the latest state of the original chain, but with no backing file).
If I'm reading the man page of 'blockcopy' correctly, shouldn't it 'flatten' the entire chain, by also copying the contents of base into copy.qcow2? i.e. the 'copy' should have files (including the file foo from 'base':
foo, bar, baz, jazz
True or false?
False. This is NOT a union mount. Sometime in between base and snap1, you deleted foo.
Hmm, I do realize that if I deleted 'foo' in between the above two points you mentioned, it _does_ reflect in snap1. I realized it's me who made a silly mistake, as I quickly did another test which validates your (very eloquent) details further below. For completness' sake, a correct test below -- it's the simplest case of blockcopy with a depth of chain of 1. 1. Create base image: $ qemu-img create -f qcow2 base.qcow2 1G 2. Create a file system on the disk & add file 'foo': --------------------------------- $ guestfish --rw -a /path/disk.qcow2 [. . .] ><fs> run ><fs> part-disk /dev/sda mbr ><fs> mkfs ext4 /dev/sda1 ><fs> list-filesystems ><fs> mount /dev/sda1 / ><fs> touch /foo ><fs> touch /bar ><fs> ls / foo lost+found ><fs>exit -------------------------------- 2. Create a snapshot, 'snap1' with backing file as 'base': $ qemu-img create -f qcow2 -b base.qcow2 \ -o backing_fmt=qcow2 snap1.qcow2 2.1. Examine contents of 'snap1', add a couple more files: bar, baz, jazz: -------------------------------- $ guestfish --rw -a snap1.qcow2 [. . .] ><fs> run ><fs> mout /dev/sda1 / mout: unknown command ><fs> mount /dev/sda1 / ><fs> ls / foo lost+found ><fs> touch /bar ><fs> touch /baz ><fs> touch /jazz ><fs> ls / bar baz foo jazz lost+found -------------------------------- 3. Provide SELinux context: $ chcon -t svirt_image_t base.qcow2 snap1.qcow2 4. Create a persistent XML file: ---------- $ cat <<EOF > /etc/libvirt/qemu/testvm.xml <domain type='kvm'> <name>testvm</name> <memory unit='MiB'>512</memory> <vcpu>1</vcpu> <os> <type arch='x86_64'>hvm</type> </os> <devices> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/export/src/snap1.qcow2'/> <target dev='vda' bus='virtio'/> </disk> </devices> </domain> EOF ---------- 5. Perform blockcopy: $ virsh blockcopy --domain testvm vda \ /export/dst/copy.qcow2 \ --wait --verbose --finish Block Copy: [100 %] Successfully copied 6. Examine contents of copy.qcow2: -------------------------------- $ guestfish --ro -a /export/dst/copy.qcow2 ><fs> run ><fs> mount /dev/sda1 / ><fs> ls / bar baz foo jazz lost+found ><fs> quit -------------------------------- 6.1. Enumerate the backing chain of copy.qcow2, it should be a standalone image: -------------------------------- $ qemu-img info --backing-chain /export/dst/copy.qcow2 image: /export/dst/copy.qcow2 file format: qcow2 virtual size: 1.0G (1073741824 bytes) disk size: 18M cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false --------------------------------
That is recorded in snap1, and when reading a chain, you stop at the first level of the chain that provides information. When flattening, it means you are inherently losing any information about the state that existed before snap1 changed the state, at least when using the flattened chain to try and find that information.
Graphically (well, using ASCII), let's look at it like this. When you start your guest originally, you have a big blank disk being tracked by the base image, and write into some sectors of that disk. So, use "A" to represent the initial OS install, and "X" to represent a sector not yet written:
base: AAAAXXXXXXXXXXXX ==== guest: AAAAXXXXXXXXXXXX
Then, you modify the guest to write the file foo, represent that with "B" for the sectors that were modified:
base: AAABBBBBXXXXXXXX ==== guest: AAABBBBBXXXXXXXX
then you take a snapshot, at the point you take it, snap1 is completely empty, but notice that the guest view of the world is still unchanged:
base: AAABBBBBXXXXXXXX snap1: XXXXXXXXXXXXXXXX ==== guest: AAABBBBBXXXXXXXX
now you do some more modification, such as deleting foo, and creating bar (note that deleting a file can be done by writing one sector, and may still leave remnants of the file behind in other sectors, but in such a way that the guest file system will never retrieve those contents). Represent these changes with "C"
base: AAABBBBBXXXXXXXX snap1: XXXCXXXXCCCCXXXX ==== guest: AAACBBBBCCCCXXXX
When you are doing a full blockcopy, you are asking to create a new file whose contents match what the guest sees. When the copy finally reaches sync, you have:
base: AAABBBBBXXXXXXXX snap1: XXXCXXXXCCCCXXXX copy: AAACBBBBCCCCXXXX ==== guest: AAACBBBBCCCCXXXX
The copy operation lasts as long as you want; in that time, the guest can make even more changes, let's call them "D"
base: AAABBBBBXXXXXXXX snap1: XXXDXXXXCCCCDDDD copy: AAADBBBBCCCCDDDD ==== guest: AAADBBBBCCCCDDDD
then you finally abort or pivot the copy. Let's try a pivot, where the next action in the guest causes changes to the disk labeled "E":
base: AAABBBBBXXXXXXXX snap1: XXXDXXXXCCCCDDDD
copy: AAAEBBBBCCCCDDDD ==== guest: AAAEBBBBCCCCDDDD
PS: I've tested the cases of --pivot, --shallow and --reuse-external, will post my notes about them on a wiki.
I hope those help you figure out what's going on.
They do, thanks for taking time to write these abundantly clear details. As my newer test provied it was a PEBKAC. I really liked the way you denote the 'guest' view and the disk/snapshot views.
You seem to be hoping for a magic bullet that gives you file system union mounts (merge the contents of two different timestamps of a directories existence in a common file system) - but that is NOT what disk snapshots do.
In reality I wasn't expecting union mounts at all :-) I didn't think of it actively untill you explicitly mentioned the topic with so much of details.
All libvirt and qemu can do is block level manipulations, not file system manipulations. I'm not even sure if a file system tool exists that can do file system checkpoints and/or union mount merges; but if it does, it would be something you use in the guest at the file system level, and not something libvirt can manage at the block device sector level.
Understood. Thanks again, for all these details, Eric. -- /kashyap