[libvirt] surprising <backingStore type='file'> setting in domain.xml

Hello all. My currently used versions: libvirt-5.2.0 and qemu-4.0.0. Here is my problem. I'm struggeling since a few weeks with a strange behaviour by either qemu or libvirt. After a reboot of the hardware node the $domain.xml contains suddenly a backingStore setting which was not there before reboot. Something like that: <devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/shinymail/shinymail_weekly.qcow2-2019-05-15'/> <backingStore type='file'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/shinymail.qcow2'/> </backingStore> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> ... This obviousely happens after a backup has been running. The Backup Script looks like this: <snip> virsh snapshot-create-as --domain shinymail weekly --diskspec vda,file=/var/lib/libvirt/shinymail/shinymail_weekly.qcow2-$(date +%Y-%m-%d) --disk-only --atomic --no-metadata cp ... virsh blockcommit shinymail vda --active --verbose --pivot <snip> So after that "dmblklist shinymail" does show the right source file but after a reboot it tries to use the weekly snapshot again which leads to filesystem errors. Someone has an idea what could cause such a behaviour? cheers t.

On 5/16/19 10:20 AM, Thomas Stein wrote:
Hello all.
My currently used versions: libvirt-5.2.0 and qemu-4.0.0.
Here is my problem. I'm struggeling since a few weeks with a strange behaviour by either qemu or libvirt. After a reboot of the hardware node the $domain.xml contains suddenly a backingStore setting which was not there before reboot. Something like that:
<devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/shinymail/shinymail_weekly.qcow2-2019-05-15'/> <backingStore type='file'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/shinymail.qcow2'/>
Yes, this matches:
</backingStore> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> ...
This obviousely happens after a backup has been running. The Backup Script looks like this:
<snip> virsh snapshot-create-as --domain shinymail weekly --diskspec vda,file=/var/lib/libvirt/shinymail/shinymail_weekly.qcow2-$(date +%Y-%m-%d) --disk-only --atomic --no-metadata
the effects of this command. Ultimately, I'm TRYING to get my new 'virsh domain-backup' command integrated into the next libvirt release, which has the advantage of performing a backup WITHOUT having to modify the <domain> XML. But until that happens, any time you use 'virsh snapshot-create-as' as part of a sequence for performing backups, you ARE modifying the <domain> XML, and if you want to revert to the external backup, or if...
cp ...
virsh blockcommit shinymail vda --active --verbose --pivot <snip>
...blockcommit fails for whatever reason to undo the effects of 'snapshot-create-as' in creating a temporary overlay, then yes, you do have to worry about the temporary overlay being in the way, where you'll have to manually edit the <domain> definition to match the actual disk layouts you really want.
So after that "dmblklist shinymail" does show the right source file but after a reboot it tries to use the weekly snapshot again which leads to filesystem errors.
Someone has an idea what could cause such a behaviour?
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org

On Thu, May 16, 2019 at 17:20:05 +0200, Thomas Stein wrote:
Hello all.
Hi,
My currently used versions: libvirt-5.2.0 and qemu-4.0.0.
I don't think this is a regression, but it will not hurt to ask. Is this a new problem which was not observed before? (I'm asking because I was messing with the blockjob code lately.)
Here is my problem. I'm struggeling since a few weeks with a strange behaviour by either qemu or libvirt. After a reboot of
This is definitely a libvirt problem ...
the hardware node the $domain.xml contains suddenly a backingStore setting which was not there before reboot. Something like that:
<devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/shinymail/shinymail_weekly.qcow2-2019-05-15'/> <backingStore type='file'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/shinymail.qcow2'/> </backingStore> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> ...
This obviousely happens after a backup has been running. The Backup Script looks like this:
<snip> virsh snapshot-create-as --domain shinymail weekly --diskspec vda,file=/var/lib/libvirt/shinymail/shinymail_weekly.qcow2-$(date +%Y-%m-%d) --disk-only --atomic --no-metadata
So the problem is that this command modifies both the running configuration and the inactive configuration (which can be accessed by virsh dumpxml --inactive, while the VM is running) ..
cp ...
virsh blockcommit shinymail vda --active --verbose --pivot <snip>
... while this only modifies the running configuration as we usually don't have enough metadata stored to be able to undo the inactive (next-boot) config as well. Since snapshot-create-as recorded the overlay in the next boot config and blockcommit did not undo it, the next time you boot it we will attempt to start the VM back with the overlay image which was actually merged back. If you'd delete it after blockcommit, the startup would fail completely. This unfortunate design fail of those two APIs originates from tunnel vision while implementing them as that workflow is usually used with "transient" VMs (VMs which don't have a inactive config in libvirt and vanish after being turned off) and thus don't have a problem with the inactive configuration because the management app recreates it. While there are efforts to make this work properly going on currently, you might want to use the following workaround: virsh dumpxml --inactive --domain shinymail > inactive.xml virsh snapshot-create ..... cp ... virsh blockcommit .... virsh define inactive.xml rm /var/lib/libvirt/shinymail/shinymail_weekly.... rm inactive.xml (unfortunately while 'blockcommit' has --delete option it was never made functional, so you have to delete the overlay manually). Since the overlay was merged it makes no sense to keep it around. Thanks for reporting this problem though, I'll try addressing it with my disk storage handling rework.

On 2019-05-17 09:36, Peter Krempa wrote:
On Thu, May 16, 2019 at 17:20:05 +0200, Thomas Stein wrote:
Hello all.
Hi,
My currently used versions: libvirt-5.2.0 and qemu-4.0.0.
I don't think this is a regression, but it will not hurt to ask. Is this a new problem which was not observed before? (I'm asking because I was messing with the blockjob code lately.)
Yes it happens since 5.1 I guess? Maybe even 5.0.
Here is my problem. I'm struggeling since a few weeks with a strange behaviour by either qemu or libvirt. After a reboot of
This is definitely a libvirt problem ...
the hardware node the $domain.xml contains suddenly a backingStore setting which was not there before reboot. Something like that:
<devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/shinymail/shinymail_weekly.qcow2-2019-05-15'/> <backingStore type='file'> <format type='qcow2'/> <source file='/var/lib/libvirt/images/shinymail.qcow2'/> </backingStore> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> ...
This obviousely happens after a backup has been running. The Backup Script looks like this:
<snip> virsh snapshot-create-as --domain shinymail weekly --diskspec vda,file=/var/lib/libvirt/shinymail/shinymail_weekly.qcow2-$(date +%Y-%m-%d) --disk-only --atomic --no-metadata
So the problem is that this command modifies both the running configuration and the inactive configuration (which can be accessed by virsh dumpxml --inactive, while the VM is running) ..
cp ...
virsh blockcommit shinymail vda --active --verbose --pivot <snip>
... while this only modifies the running configuration as we usually don't have enough metadata stored to be able to undo the inactive (next-boot) config as well.
Since snapshot-create-as recorded the overlay in the next boot config and blockcommit did not undo it, the next time you boot it we will attempt to start the VM back with the overlay image which was actually merged back. If you'd delete it after blockcommit, the startup would fail completely.
This unfortunate design fail of those two APIs originates from tunnel vision while implementing them as that workflow is usually used with "transient" VMs (VMs which don't have a inactive config in libvirt and vanish after being turned off) and thus don't have a problem with the inactive configuration because the management app recreates it.
While there are efforts to make this work properly going on currently, you might want to use the following workaround:
Thank you. Will do so. cheers t.
virsh dumpxml --inactive --domain shinymail > inactive.xml virsh snapshot-create ..... cp ... virsh blockcommit .... virsh define inactive.xml rm /var/lib/libvirt/shinymail/shinymail_weekly.... rm inactive.xml
(unfortunately while 'blockcommit' has --delete option it was never made functional, so you have to delete the overlay manually). Since the overlay was merged it makes no sense to keep it around.
Thanks for reporting this problem though, I'll try addressing it with my disk storage handling rework.

On Fri, May 17, 2019 at 09:56:56 +0200, Thomas Stein wrote:
On 2019-05-17 09:36, Peter Krempa wrote:
On Thu, May 16, 2019 at 17:20:05 +0200, Thomas Stein wrote:
Hello all.
Hi,
My currently used versions: libvirt-5.2.0 and qemu-4.0.0.
I don't think this is a regression, but it will not hurt to ask. Is this a new problem which was not observed before? (I'm asking because I was messing with the blockjob code lately.)
Yes it happens since 5.1 I guess? Maybe even 5.0.
Oh, I see what's happening. It was in fact me who broke it. The problem is that in fact block commit is supposed to fix the original image in this case, but my change does not save the inactive configuration to disk after it was modified due to a logic bug. Thus it only exposes itself after the libvirt daemon is restarted. I'll send a patch in a moment.
participants (3)
-
Eric Blake
-
Peter Krempa
-
Thomas Stein