On Thu, Jul 03, 2014 at 11:08:15AM -0600, Eric Blake wrote:
On 07/02/2014 01:12 PM, Kashyap Chamarthy wrote:
> We have this simple chain:
>
> base <- snap1
>
> Let's quickly examine the contents of 'base' and 'snap1'
images:
>
> Now, let's do a live blockcopy (with a '--finish' to graecully finish
> the mirroring):
>
> $ virsh blockcopy --domain testvm2 vda \
> /export/dst/copy.qcow2 \
> --wait --verbose --finish
This defaults to a full copy (copy.qcow2 will contain everything in the
latest state of the original chain, but with no backing file).
>
> If I'm reading the man page of 'blockcopy' correctly, shouldn't it
> 'flatten' the entire chain, by also copying the contents of base into
> copy.qcow2? i.e. the 'copy' should have files (including the file foo
> from 'base':
>
> foo, bar, baz, jazz
>
>
> True or false?
False. This is NOT a union mount. Sometime in between base and snap1,
you deleted foo.
Hmm, I do realize that if I deleted 'foo' in between the above two
points you mentioned, it _does_ reflect in snap1.
I realized it's me who made a silly mistake, as I quickly did another
test which validates your (very eloquent) details further below.
For completness' sake, a correct test below -- it's the simplest case of
blockcopy with a depth of chain of 1.
1. Create base image:
$ qemu-img create -f qcow2 base.qcow2 1G
2. Create a file system on the disk & add file 'foo':
---------------------------------
$ guestfish --rw -a /path/disk.qcow2
[. . .]
<fs> run
<fs> part-disk /dev/sda mbr
<fs> mkfs ext4 /dev/sda1
<fs> list-filesystems
<fs> mount /dev/sda1 /
<fs> touch /foo
<fs> touch /bar
<fs> ls /
foo
lost+found
<fs>exit
--------------------------------
2. Create a snapshot, 'snap1' with backing file as 'base':
$ qemu-img create -f qcow2 -b base.qcow2 \
-o backing_fmt=qcow2 snap1.qcow2
2.1. Examine contents of 'snap1', add a couple more files: bar, baz,
jazz:
--------------------------------
$ guestfish --rw -a snap1.qcow2
[. . .]
<fs> run
<fs> mout /dev/sda1 /
mout: unknown command
<fs> mount /dev/sda1 /
<fs> ls /
foo
lost+found
<fs> touch /bar
<fs> touch /baz
<fs> touch /jazz
<fs> ls /
bar
baz
foo
jazz
lost+found
--------------------------------
3. Provide SELinux context:
$ chcon -t svirt_image_t base.qcow2 snap1.qcow2
4. Create a persistent XML file:
----------
$ cat <<EOF > /etc/libvirt/qemu/testvm.xml
<domain type='kvm'>
<name>testvm</name>
<memory unit='MiB'>512</memory>
<vcpu>1</vcpu>
<os>
<type arch='x86_64'>hvm</type>
</os>
<devices>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/export/src/snap1.qcow2'/>
<target dev='vda' bus='virtio'/>
</disk>
</devices>
</domain>
EOF
----------
5. Perform blockcopy:
$ virsh blockcopy --domain testvm vda \
/export/dst/copy.qcow2 \
--wait --verbose --finish
Block Copy: [100 %]
Successfully copied
6. Examine contents of copy.qcow2:
--------------------------------
$ guestfish --ro -a /export/dst/copy.qcow2
><fs> run
<fs> mount /dev/sda1 /
<fs> ls /
bar
baz
foo
jazz
lost+found
<fs> quit
--------------------------------
6.1. Enumerate the backing chain of copy.qcow2, it should be a
standalone image:
--------------------------------
$ qemu-img info --backing-chain /export/dst/copy.qcow2
image: /export/dst/copy.qcow2
file format: qcow2
virtual size: 1.0G (1073741824 bytes)
disk size: 18M
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
--------------------------------
That is recorded in snap1, and when reading a chain,
you stop at the first level of the chain that provides information.
When flattening, it means you are inherently losing any information
about the state that existed before snap1 changed the state, at least
when using the flattened chain to try and find that information.
Graphically (well, using ASCII), let's look at it like this. When you
start your guest originally, you have a big blank disk being tracked by
the base image, and write into some sectors of that disk. So, use "A"
to represent the initial OS install, and "X" to represent a sector not
yet written:
base: AAAAXXXXXXXXXXXX
====
guest: AAAAXXXXXXXXXXXX
Then, you modify the guest to write the file foo, represent that with
"B" for the sectors that were modified:
base: AAABBBBBXXXXXXXX
====
guest: AAABBBBBXXXXXXXX
then you take a snapshot, at the point you take it, snap1 is completely
empty, but notice that the guest view of the world is still unchanged:
base: AAABBBBBXXXXXXXX
snap1: XXXXXXXXXXXXXXXX
====
guest: AAABBBBBXXXXXXXX
now you do some more modification, such as deleting foo, and creating
bar (note that deleting a file can be done by writing one sector, and
may still leave remnants of the file behind in other sectors, but in
such a way that the guest file system will never retrieve those
contents). Represent these changes with "C"
base: AAABBBBBXXXXXXXX
snap1: XXXCXXXXCCCCXXXX
====
guest: AAACBBBBCCCCXXXX
When you are doing a full blockcopy, you are asking to create a new file
whose contents match what the guest sees. When the copy finally reaches
sync, you have:
base: AAABBBBBXXXXXXXX
snap1: XXXCXXXXCCCCXXXX
copy: AAACBBBBCCCCXXXX
====
guest: AAACBBBBCCCCXXXX
The copy operation lasts as long as you want; in that time, the guest
can make even more changes, let's call them "D"
base: AAABBBBBXXXXXXXX
snap1: XXXDXXXXCCCCDDDD
copy: AAADBBBBCCCCDDDD
====
guest: AAADBBBBCCCCDDDD
then you finally abort or pivot the copy. Let's try a pivot, where the
next action in the guest causes changes to the disk labeled "E":
base: AAABBBBBXXXXXXXX
snap1: XXXDXXXXCCCCDDDD
copy: AAAEBBBBCCCCDDDD
====
guest: AAAEBBBBCCCCDDDD
>
>
> PS: I've tested the cases of --pivot, --shallow and --reuse-external,
> will post my notes about them on a wiki.
I hope those help you figure out what's going on.
They do, thanks for taking time to write these abundantly clear details.
As my newer test provied it was a PEBKAC. I really liked the way you
denote the 'guest' view and the disk/snapshot views.
You seem to be hoping
for a magic bullet that gives you file system union mounts (merge the
contents of two different timestamps of a directories existence in a
common file system) - but that is NOT what disk snapshots do.
In reality I wasn't expecting union mounts at all :-) I didn't think of
it actively untill you explicitly mentioned the topic with so much of
details.
All
libvirt and qemu can do is block level manipulations, not file system
manipulations. I'm not even sure if a file system tool exists that can
do file system checkpoints and/or union mount merges; but if it does, it
would be something you use in the guest at the file system level, and
not something libvirt can manage at the block device sector level.
Understood. Thanks again, for all these details, Eric.
--
/kashyap