[libvirt-users] Problem executing VM backups

Hi everyone, we are suddenly having a problem with executing our backup jobs. For a long time, we have used a shell script which contains the following code to backup all our virtual machines: for domain in Testserver Faktura Fileserver Gitolite Jenkins Nexus SimpleHelp VpnGateway Wiki; do echo -n "$(date +"%Y-%m-%d %H:%M:%S") starting backup for vm ${domain} ... " >> ${vmlog} virsh dumpxml --security-info ${domain} > ${vmdir}/${domain}.xml virsh undefine ${domain} >> ${vmlog} virsh blockcopy ${domain} /var/lib/libvirt/images/${domain}.img ${vmdir}/${domain}.img --wait --finish >> ${vmlog} virsh define ${vmdir}/${domain}.xml >> ${vmlog} done This has worked great for us, but all of the sudden (eventually triggered by an update, since of course we do regular security/package updates on this machine) we are having problems. For some virtual machines, it still works perfectly, but for others, virsh tells us that a blockjob is still active and therefore the backup fails. This seems to happen to machines at random. However, when we then try to query the active blockjob, virsh tells us that no blockjob is active. Consider the following log from the shell: root@gfii-host:~# virsh undefine Gitolite error: Failed to undefine domain Gitolite error: Requested operation is not valid: cannot undefine transient domain root@gfii-host:~# virsh blockcopy Gitolite /var/lib/libvirt/images/Gitolite.img /tmp/test-blockcopy-gitolite.img --wait --finish error: block copy still active: disk 'vda' already in active block job root@gfii-host:~# virsh blockjob Gitolite /var/lib/libvirt/images/Gitolite.img No current block job for /var/lib/libvirt/images/Gitolite.img root@gfii-host:~# virsh define /var/local/backup/vms/2016-06-22T013001/Gitolite.xml error: Failed to define domain from /var/local/backup/vms/2016-06-22T013001/Gitolite.xml error: block copy still active: domain has active block job Of course we tried to start/stop the virtual machines, rebooted the whole host multiple times etc., but the problem comes back every night. The machine is a Debian Wheezy machine with current updates. We are using the qemu-kvm package from wheezy-backports to enable blockcopy support. Best regards Markus

On Wed, Jun 22, 2016 at 08:59:24AM +0200, Markus Ellinger wrote:
Hi everyone,
we are suddenly having a problem with executing our backup jobs. For a long time, we have used a shell script which contains the following code to backup all our virtual machines:
for domain in Testserver Faktura Fileserver Gitolite Jenkins Nexus SimpleHelp VpnGateway Wiki; do echo -n "$(date +"%Y-%m-%d %H:%M:%S") starting backup for vm ${domain} ... " >> ${vmlog} virsh dumpxml --security-info ${domain} > ${vmdir}/${domain}.xml virsh undefine ${domain} >> ${vmlog} virsh blockcopy ${domain} /var/lib/libvirt/images/${domain}.img ${vmdir}/${domain}.img --wait --finish
${vmlog} virsh define ${vmdir}/${domain}.xml >> ${vmlog} done
[...]
root@gfii-host:~# virsh undefine Gitolite error: Failed to undefine domain Gitolite error: Requested operation is not valid: cannot undefine transient domain
root@gfii-host:~# virsh blockcopy Gitolite /var/lib/libvirt/images/Gitolite.img /tmp/test-blockcopy-gitolite.img --wait --finish error: block copy still active: disk 'vda' already in active block job
root@gfii-host:~# virsh blockjob Gitolite /var/lib/libvirt/images/Gitolite.img No current block job for /var/lib/libvirt/images/Gitolite.img
I was going to suggest that in the even if you do see an active block job here, then you could've aborted it via: $ virsh blockjob Gitolite /path/to/disk/ --abort But you say there's no active block operation. (I doubt it will help, but you might want to supply "--info" to that `virsh blockjob` query. From the manual: "In --info mode, the active job information on the specified disk will be printed.")
root@gfii-host:~# virsh define /var/local/backup/vms/2016-06-22T013001/Gitolite.xml error: Failed to define domain from /var/local/backup/vms/2016-06-22T013001/Gitolite.xml error: block copy still active: domain has active block job
Although I've seen some bug reports (and were fixed) in the past of this error during different block operations, but I can't pin-point the exact reason here why you're seeing it. You might want to enable libvirt logging filters (in /etc/libvirt/libvirtd.conf) to get some more useful details: log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util" log_outputs="1:file:/var/log/libvirt/libvirtd.log" (Don't forget to restart libvirt daemon before performing your test.) While we're at it, here's another another way to perform live backups (which is slightly more efficient), if it is of any help: http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit
Of course we tried to start/stop the virtual machines, rebooted the whole host multiple times etc., but the problem comes back every night.
The machine is a Debian Wheezy machine with current updates. We are using the qemu-kvm package from wheezy-backports to enable blockcopy support.
Specifying explicit versions will be convenient. -- /kashyap

Hello, thank you very much for your suggestions. I changed our backup script to use the method explained in http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit and this night, everything seems to have worked smoothly. The backup is even faster now than before. Best regards Markus Am 22.06.2016 um 19:16 schrieb Kashyap Chamarthy:
On Wed, Jun 22, 2016 at 08:59:24AM +0200, Markus Ellinger wrote:
Hi everyone,
we are suddenly having a problem with executing our backup jobs. For a long time, we have used a shell script which contains the following code to backup all our virtual machines:
for domain in Testserver Faktura Fileserver Gitolite Jenkins Nexus SimpleHelp VpnGateway Wiki; do echo -n "$(date +"%Y-%m-%d %H:%M:%S") starting backup for vm ${domain} ... " >> ${vmlog} virsh dumpxml --security-info ${domain} > ${vmdir}/${domain}.xml virsh undefine ${domain} >> ${vmlog} virsh blockcopy ${domain} /var/lib/libvirt/images/${domain}.img ${vmdir}/${domain}.img --wait --finish
${vmlog} virsh define ${vmdir}/${domain}.xml >> ${vmlog} done
[...]
root@gfii-host:~# virsh undefine Gitolite error: Failed to undefine domain Gitolite error: Requested operation is not valid: cannot undefine transient domain
root@gfii-host:~# virsh blockcopy Gitolite /var/lib/libvirt/images/Gitolite.img /tmp/test-blockcopy-gitolite.img --wait --finish error: block copy still active: disk 'vda' already in active block job
root@gfii-host:~# virsh blockjob Gitolite /var/lib/libvirt/images/Gitolite.img No current block job for /var/lib/libvirt/images/Gitolite.img
I was going to suggest that in the even if you do see an active block job here, then you could've aborted it via:
$ virsh blockjob Gitolite /path/to/disk/ --abort
But you say there's no active block operation.
(I doubt it will help, but you might want to supply "--info" to that `virsh blockjob` query. From the manual: "In --info mode, the active job information on the specified disk will be printed.")
root@gfii-host:~# virsh define /var/local/backup/vms/2016-06-22T013001/Gitolite.xml error: Failed to define domain from /var/local/backup/vms/2016-06-22T013001/Gitolite.xml error: block copy still active: domain has active block job
Although I've seen some bug reports (and were fixed) in the past of this error during different block operations, but I can't pin-point the exact reason here why you're seeing it.
You might want to enable libvirt logging filters (in /etc/libvirt/libvirtd.conf) to get some more useful details:
log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util" log_outputs="1:file:/var/log/libvirt/libvirtd.log"
(Don't forget to restart libvirt daemon before performing your test.)
While we're at it, here's another another way to perform live backups (which is slightly more efficient), if it is of any help:
http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit
Of course we tried to start/stop the virtual machines, rebooted the whole host multiple times etc., but the problem comes back every night.
The machine is a Debian Wheezy machine with current updates. We are using the qemu-kvm package from wheezy-backports to enable blockcopy support. Specifying explicit versions will be convenient.
-- Dipl.-Inf. Markus Ellinger tel +49-911-148780-14 fax +49-911-148780-44 email ellinger@gfii.de Gesellschaft für Informatik in der Industrie mbH gfii GmbH - Gabrielistr. 3 - 90480 Nürnberg Geschäftsführer: Walter Krug, Markus Ellinger Registergericht: Amtsgericht Nürnberg HRB 22340

On Thu, Jun 23, 2016 at 12:17:28PM +0200, Markus Ellinger wrote:
Hello,
thank you very much for your suggestions. I changed our backup script to use the method explained in http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit and this night, everything seems to have worked smoothly. The backup is even faster now than before.
Great, thanks for confirming. If you could consistently reproduce your other issue with 'blockcopy' with latest upstream release (of libvirt & QEMU), please file a bug with all relevant details & logs. [...] -- /kashyap

I changed our backup script to use the method explained in http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit and this night, everything seems to have worked smoothly. The backup is even faster now than before.
Does this only work for qcow2 images, or will it also work for LVM images?

On Thu, Jun 23, 2016 at 08:53:08PM +1000, Phill Edwards wrote:
I changed our backup script to use the method explained in http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit and this night, everything seems to have worked smoothly. The backup is even faster now than before.
Does this only work for qcow2 images, or will it also work for LVM images?
If your root backing file is raw (I tested this just now, again) or LVM, it should work just fine. And the overlays will _have_ to be qcow2 files. -- /kashyap

Does this only work for qcow2 images, or will it also work for LVM images?
If your root backing file is raw (I tested this just now, again) or LVM, it should work just fine. And the overlays will _have_ to be qcow2 files.
All my VM images are raw format on LVM volumes (hope I'm using the right terminology here). I don't use backing/overlay images. So this should work for me. I currently use LVM snapshots as part of my backup process. Does anyone see any pros and cons of the active block commit method in this thread vs using LVM snapshots (which are very quick to create)?

Van: libvirt-users-bounces@redhat.com [mailto:libvirt-users-bounces@redhat.com] Namens Phill Edwards Verzonden: vrijdag 24 juni 2016 5:39 Aan: Kashyap Chamarthy CC: libvirt-users@redhat.com; Markus Ellinger Onderwerp: Re: [libvirt-users] Problem executing VM backups
Does this only work for qcow2 images, or will it also work for LVM images?
If your root backing file is raw (I tested this just now, again) or LVM, it should work just fine. And the overlays will _have_ to be qcow2 files. All my VM images are raw format on LVM volumes (hope I'm using the right terminology here). I don't use backing/overlay images. So this should work for me. I currently use LVM snapshots as part of my backup process. Does anyone see any pros and cons of the active block commit method in this thread vs using LVM >snapshots (which are very quick to create)?
It’s not difficult to use external snapshots. Just follow the info: http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit For me the main advantage is the ‘quiesce’ option. With this option, the guest disk caches are flushed and the guest file system is freezed until the snapshot is created. This ensures data integrity of your disk. I think there can still a risk on databases though…
participants (4)
-
Dominique Ramaekers
-
Kashyap Chamarthy
-
Markus Ellinger
-
Phill Edwards