[libvirt-users] virsh blockcommit fails regularily (was: virtual drive performance)

14 Aug 2017

      Hi,

a small update on this. We have migrated the virtualized host to use the
virtio drivers and now the drive performance is improved so that we can see
a constant transfer rate. Before it used to be the same rate but regularly
dropped to a few bytes/sec for a few seconds and then was fast again.

However we still observe that the following fails regularily:

$ virsh snapshot-create-as --domain domain --name backup --no-metadata
--atomic --disk-only --diskspec hda,snapshot=external
$ virsh blockcommit domain hda --active --pivot
error: failed to pivot job for disk hda
error: block copy still active: disk 'hda' not ready for pivot yet
Could not merge changes for disk hda of domain. VM may be in invalid state.

Then running the following in the morning succeeds and successfully pivotes
the snapshot into the base image while the vm is live:

$ virsh blockjob domain hda --abort
$ virsh blockcommit domain hda --active --pivot
Successfully pivoted

We run the backup process every day once and it failed on the following
days:

2017-07-07
2017-07-20
2017-07-27
2017-08-12
2017-08-14

Looking at this it roughly happens once a week and the guest from then on
writes into the snapshot backlog. That snapshot backlog file grows about
8gb every day and thus the issue always needs immediate attention.

Any ideas what could cause this issue? Is this a bug (race condition) of
`virsh blockcommit` that sometimes fails because it is invoked at the wrong
time?

Cheers,
Dominik

2017-07-07 9:21 GMT+02:00 Dominik Psenner <dpsenner@gmail.com>:
...
Of course the cronjob fails when trying to virsh blockcommit and not when
creating the snapshot, sorry for the noise.
2017-07-07 9:15 GMT+02:00 Dominik Psenner <dpsenner@gmail.com>:
...
Hi,
different day, same issue.. cronjob runs and fails:
$ virsh snapshot-create-as --domain domain --name backup --no-metadata
--atomic --disk-only --diskspec hda,snapshot=external
error: failed to pivot job for disk hda
error: block copy still active: disk 'hda' not ready for pivot yet
Could not merge changes for disk hda of domain. VM may be in invalid
state.
Then running the following in the morning succeeds and successfully
pivotes the snapshot into the base image while the vm is live:
$ virsh blockjob domain hda --abort
$ virsh blockcommit domain hda --active --pivot
Successfully pivoted
This need of manual interventions is becoming a tiring job..
I someone else seeing the same issue or has an idea what the cause could
be?
Can I trust the output and is the base image really up to the latest
state?
Cheers
2017-07-02 10:30 GMT+02:00 Dominik Psenner <dpsenner@gmail.com>:
...
Just a little catch-up. This time I was able to resolve the issue by
doing:
virsh blockjob domain hda --abort
virsh blockcommit domain hda --active --pivot
Last time I had to shut down the virtual machine and do this while being
offline.
Thanks Wang for your valuable input. As far as the memory goes, there's
plenty of head room:
$ free -h
              total        used        free      shared  buff/cache
available
Mem:           7.8G        1.8G        407M        9.7M
5.5G        5.5G
Swap:          8.0G        619M        7.4G
2017-07-02 10:26 GMT+02:00 王李明 <wanglm@certusnet.com.cn>:
...
mybe this is because you physic host memory is small
then this will Causing instability of the virtual machine
But I'm just guessing
You can try to increase your memory
Wang Liming
*发件人:* libvirt-users-bounces@redhat.com [mailto:libvirt-users-bounces@
redhat.com] *代表 *Dominik Psenner
*发送时间:* 2017年7月2日 16:22
*收件人:* libvirt-users@redhat.com
*主题:* Re: [libvirt-users] virtual drive performance
Hi again,
just today an issue I've thought to be resolved popped up again. We
backup the machine by doing:
virsh snapshot-create-as --domain domain --name backup --no-metadata
--atomic --disk-only --diskspec hda,snapshot=external
# backup hda.qcow2
virsh blockcommit domain hda --active --pivot
Every now and then this process fails with the following error message:
error: failed to pivot job for disk hda
error: block copy still active: disk 'hda' not ready for pivot yet
Could not merge changes for disk hda of domain. VM may be in invalid
state.
I expect live backups are a great asset and should work. Is this a bug
that may relates also to the virtual drive performance issues we observe?
Cheers
2017-07-02 10:10 GMT+02:00 Dominik Psenner <dpsenner@gmail.com>:
Hi
a small update on this. I just migrated the vm from the site to my
laptop and fired it up. The exact same xml configuration (except file paths
and such) starts up and bursts with 50Mb/s to 115Mb/s in the guest. This
allows only one reasonable answer: the cpu on my laptop is somehow better
suited to emulate IO than the CPU built into the host on site. The host
there is a HP proliant microserver gen8 with xeon processor. But the
processor there is also never capped at 100% when the guest copies files.
I just ran another test by copying a 3Gb large file on the guest. What
I can observe on my computer is that the copy process is not at a constant
rate but rather starts with 90Mb/s, then drops down to 30Mb/s, goes up to
70Mb/s, drops down to 1Mb/s, goes up to 75Mb/s, drops to 1Mb/s, goes up to
55Mb/s and the pattern continues. Please note that the drive is still
configured as:
<driver name='qemu' type='qcow2' cache='none' io='threads'/>
and I would expect a constant rate that is either high or low since
there is no caching involved and the underlying hard drive is a samsung ssd
evo 850. To have an idea how fast that drive is on my laptop:
$ dd if=/dev/zero of=testfile bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.47301 s, 424 MB/s
I can further observe that the smaller the saved chunks are the slower
the overall performance is:
dd if=/dev/zero of=testfile bs=512K count=1000 oflag=direct
1000+0 records in
1000+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 1.34874 s, 389 MB/s
$ dd if=/dev/zero of=testfile bs=5K count=1000 oflag=direct
1000+0 records in
1000+0 records out
5120000 bytes (5.1 MB, 4.9 MiB) copied, 0.105109 s, 48.7 MB/s
$ dd if=/dev/zero of=testfile bs=1K count=10000 oflag=direct
10000+0 records in
10000+0 records out
10240000 bytes (10 MB, 9.8 MiB) copied, 0.668438 s, 15.3 MB/s
$ dd if=/dev/zero of=testfile bs=512 count=20000 oflag=direct
20000+0 records in
20000+0 records out
10240000 bytes (10 MB, 9.8 MiB) copied, 1.10964 s, 9.2 MB/s
Could this be a limiting factor? Does qemu/kvm do many many writes of
just a few bytes?
Ideas, anyone?
Cheers
2017-06-21 20:46 GMT+02:00 Dan <srwx4096@gmail.com>:
...
On Tue, Jun 20, 2017 at 3:38 PM, Dominik Psenner <dpsenner@gmail.com>
wrote:
...
to the following:
<disk type='file' device='disk'>
  <driver name='qemu' type='qcow2' cache='none'/>
  <source file='/var/data/virtuals/machines/windows-server-2016-
x64/image.qcow2'/>
  <backingStore/>
  <target dev='hda' bus='scsi'/>
  <address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
Do you see any gotchas in this configuration that could prevent the
virtualized guest to power on and boot up?
When I configure like this, from a linux guest point of view I get
...
Symbios Logic SCSI Controller:
00:08.0 SCSI storage controller: LSI Logic / Symbios Logic 53c895a
But htis is true only if you add the SCSI controller too, not only
On Tue, Jun 20, 2017 at 04:24:32PM +0200, Gianluca Cecchi wrote:
this
the disk
...
definition.
In my case
<controller type='scsi' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08'
function='0x0'/>
    </controller>
Note the slot='0x08' that is reflected into the first field of lspci
inside
my linux guest.
So between your controllers you have to add the SCSI one
In my case (Fedora 25 with virt-manager-1.4.1-2.fc25.noarch,
qemu-kvm-2.7.1-6.fc25.x86_64, libvirt-2.2.1-2.fc25.x86_64) with "Disk
bus"
set as SCSI in virt-manager, the xml defintiion for the guest is
automatically updated with the controller if not existent yet.
And the disk definition sections is like this:
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/slaxsmall.qcow2'/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0'
unit='0'/>
    </disk>
So I think you should set dev='sda' and not 'hda' in your xml for it
I am actually very curious to know if that would make a difference. I
don't have a such windows vm images ready to test at present.
Dan
...
I don't kknow if w2016 contains the symbios logic drivers already
installed, so that a "simple" reboot could imply an automatic
reconfiguration of the guest....
Note also that in Windows when the hw configuration is considered
heavily
changed, you could be asked to register again (I don't think that the
IDE
--> SCSI should imply it...)
Gianluca
...
_______________________________________________
libvirt-users mailing list
libvirt-users@redhat.com
https://www.redhat.com/mailman/listinfo/libvirt-users
--
Dominik Psenner
--
Dominik Psenner
--
Dominik Psenner
--
Dominik Psenner
--
Dominik Psenner
-- 
Dominik Psenner

Dominik Psenner

Peter Krempa

Dominik Psenner

tags

participants (2)