On Mon, Aug 14, 2017 at 08:42:24 +0200, Dominik Psenner wrote:
Hi,
Hi,
a small update on this. We have migrated the virtualized host to use the
virtio drivers and now the drive performance is improved so that we can see
a constant transfer rate. Before it used to be the same rate but regularly
dropped to a few bytes/sec for a few seconds and then was fast again.
However we still observe that the following fails regularily:
$ virsh snapshot-create-as --domain domain --name backup --no-metadata
--atomic --disk-only --diskspec hda,snapshot=external
$ virsh blockcommit domain hda --active --pivot
error: failed to pivot job for disk hda
error: block copy still active: disk 'hda' not ready for pivot yet
Could not merge changes for disk hda of domain. VM may be in invalid state.
since this thread was renamed, please re-state the version of libvirt
you are using. I don't really want to dig through the old thread.
Then running the following in the morning succeeds and successfully
pivotes
the snapshot into the base image while the vm is live:
$ virsh blockjob domain hda --abort
$ virsh blockcommit domain hda --active --pivot
Successfully pivoted
We run the backup process every day once and it failed on the following
days:
2017-07-07
2017-07-20
2017-07-27
2017-08-12
2017-08-14
Looking at this it roughly happens once a week and the guest from then on
writes into the snapshot backlog. That snapshot backlog file grows about
8gb every day and thus the issue always needs immediate attention.
Any ideas what could cause this issue? Is this a bug (race condition) of
`virsh blockcommit` that sometimes fails because it is invoked at the wrong
time?
So the 'virsh blockcommit domain hda --active --pivot' operation
consists of 3 parts:
1) virsh blockcommit domain hda --active
2) waiting until the block job finishes
3) virsh blockjob --pivot domain hda
The problem is that some times 2) finishes too soon and then operation 3
fails. This should not happen any more, since there's code in virsh [1]
which waits for the completion event from libvirtd, which is fired only
when the job is actually ready to be pivoted.
This code has a lot of fallback options in case when libvirtd is old or
so.
At any rate, manual pivoting later should help. Also probably updating
to a more recent version.
In case you are using a farily recent version, it's possible that there
are still bugs though.
Peter
[1]:
commit 7408403560f7d054da75acaab855a95c51a92e2b
Author: Peter Krempa <pkrempa(a)redhat.com>
Date: Mon Jul 13 17:04:49 2015 +0200
virsh: Refactor block job waiting in cmdBlockCommit
Reuse the vshBlockJobWait infrastructure to refactor cmdBlockCommit to
use the common code. This additionally fixes a bug when working with
new qemus, where when doing an active commit with --pivot the pivoting
would fail, since qemu reaches 100% completion but the job doesn't
switch to synchronized phase right away.
$ git describe --contains 7408403560f7d054da75acaab855a95c51a92e2b
v1.2.18-rc1~33