Re: [libvirt-users] virsh blockcommit fails regularily (was: virtual drive performance)

Monday, 14 August 2017

On Mon, Aug 14, 2017 at 08:42:24 +0200, Dominik Psenner wrote:
...
 Hi, 
Hi,

...

 a small update on this. We have migrated the virtualized host to use the
 virtio drivers and now the drive performance is improved so that we can see
 a constant transfer rate. Before it used to be the same rate but regularly
 dropped to a few bytes/sec for a few seconds and then was fast again.

 However we still observe that the following fails regularily:

 $ virsh snapshot-create-as --domain domain --name backup --no-metadata
 --atomic --disk-only --diskspec hda,snapshot=external
 $ virsh blockcommit domain hda --active --pivot
 error: failed to pivot job for disk hda
 error: block copy still active: disk 'hda' not ready for pivot yet
 Could not merge changes for disk hda of domain. VM may be in invalid state. 
since this thread was renamed, please re-state the version of libvirt
you are using. I don't really want to dig through the old thread.

...
 Then running the following in the morning succeeds and successfully
pivotes
 the snapshot into the base image while the vm is live:

 $ virsh blockjob domain hda --abort
 $ virsh blockcommit domain hda --active --pivot
 Successfully pivoted

 We run the backup process every day once and it failed on the following
 days:

 2017-07-07
 2017-07-20
 2017-07-27
 2017-08-12
 2017-08-14

 Looking at this it roughly happens once a week and the guest from then on
 writes into the snapshot backlog. That snapshot backlog file grows about
 8gb every day and thus the issue always needs immediate attention.

 Any ideas what could cause this issue? Is this a bug (race condition) of
 `virsh blockcommit` that sometimes fails because it is invoked at the wrong
 time? 
So the 'virsh blockcommit domain hda --active --pivot' operation
consists of 3 parts:

1) virsh blockcommit domain hda --active
2) waiting until the block job finishes
3) virsh blockjob --pivot domain hda

The problem is that some times 2) finishes too soon and then operation 3
fails. This should not happen any more, since there's code in virsh [1]
which waits for the completion event from libvirtd, which is fired only
when the job is actually ready to be pivoted.

This code has a lot of fallback options in case when libvirtd is old or
so.

At any rate, manual pivoting later should help. Also probably updating
to a more recent version.

In case you are using a farily recent version, it's possible that there
are still bugs though.

Peter

[1]:

commit 7408403560f7d054da75acaab855a95c51a92e2b
Author: Peter Krempa <pkrempa(a)redhat.com&gt;
Date:   Mon Jul 13 17:04:49 2015 +0200

    virsh: Refactor block job waiting in cmdBlockCommit

    Reuse the vshBlockJobWait infrastructure to refactor cmdBlockCommit to
    use the common code. This additionally fixes a bug when working with
    new qemus, where when doing an active commit with --pivot the pivoting
    would fail, since qemu reaches 100% completion but the job doesn't
    switch to synchronized phase right away.

$ git describe --contains 7408403560f7d054da75acaab855a95c51a92e2b
v1.2.18-rc1~33

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [libvirt-users] virsh blockcommit fails regularily (was: virtual drive performance)