
On Mon, Aug 14, 2017 at 08:42:24 +0200, Dominik Psenner wrote:
Hi,
Hi,
a small update on this. We have migrated the virtualized host to use the virtio drivers and now the drive performance is improved so that we can see a constant transfer rate. Before it used to be the same rate but regularly dropped to a few bytes/sec for a few seconds and then was fast again.
However we still observe that the following fails regularily:
$ virsh snapshot-create-as --domain domain --name backup --no-metadata --atomic --disk-only --diskspec hda,snapshot=external $ virsh blockcommit domain hda --active --pivot error: failed to pivot job for disk hda error: block copy still active: disk 'hda' not ready for pivot yet Could not merge changes for disk hda of domain. VM may be in invalid state.
since this thread was renamed, please re-state the version of libvirt you are using. I don't really want to dig through the old thread.
Then running the following in the morning succeeds and successfully pivotes the snapshot into the base image while the vm is live:
$ virsh blockjob domain hda --abort $ virsh blockcommit domain hda --active --pivot Successfully pivoted
We run the backup process every day once and it failed on the following days:
2017-07-07 2017-07-20 2017-07-27 2017-08-12 2017-08-14
Looking at this it roughly happens once a week and the guest from then on writes into the snapshot backlog. That snapshot backlog file grows about 8gb every day and thus the issue always needs immediate attention.
Any ideas what could cause this issue? Is this a bug (race condition) of `virsh blockcommit` that sometimes fails because it is invoked at the wrong time?
So the 'virsh blockcommit domain hda --active --pivot' operation consists of 3 parts: 1) virsh blockcommit domain hda --active 2) waiting until the block job finishes 3) virsh blockjob --pivot domain hda The problem is that some times 2) finishes too soon and then operation 3 fails. This should not happen any more, since there's code in virsh [1] which waits for the completion event from libvirtd, which is fired only when the job is actually ready to be pivoted. This code has a lot of fallback options in case when libvirtd is old or so. At any rate, manual pivoting later should help. Also probably updating to a more recent version. In case you are using a farily recent version, it's possible that there are still bugs though. Peter [1]: commit 7408403560f7d054da75acaab855a95c51a92e2b Author: Peter Krempa <pkrempa@redhat.com> Date: Mon Jul 13 17:04:49 2015 +0200 virsh: Refactor block job waiting in cmdBlockCommit Reuse the vshBlockJobWait infrastructure to refactor cmdBlockCommit to use the common code. This additionally fixes a bug when working with new qemus, where when doing an active commit with --pivot the pivoting would fail, since qemu reaches 100% completion but the job doesn't switch to synchronized phase right away. $ git describe --contains 7408403560f7d054da75acaab855a95c51a92e2b v1.2.18-rc1~33