[libvirt-users] Migration problem - takes 5 minutes to start moving the memory

Hi, I'm facing a strange issue while doing a migration from an hypervisor to another one. The migration takes for ever to start moving the memory. The VM had no workload what so ever, just a basic ubuntu image. The versions on the hypervisors are: libvirt 1.2.21, qemu 1.2.3 Command to launche the migration: virsh migrate --verbose --live --abort-on-error --tunnelled --p2p --auto-converge --copy-storage-inc --xml vm-6160.xml 6160 qemu+tls://<destination_hypervisor>/system Here is the log output, look at the time elapsed: root@virt-hv009:~# virsh domjobinfo 6160 Job type: Unbounded Time elapsed: 27518 ms Data processed: 21.506 GiB Data remaining: 29.003 GiB Data total: 50.509 GiB Memory processed: 0.000 B Memory remaining: 520.820 MiB Memory total: 520.820 MiB File processed: 21.506 GiB File remaining: 28.494 GiB File total: 50.000 GiB Constant pages: 0 Normal pages: 0 Normal data: 0.000 B Expected downtime: 300 ms Setup time: 6 ms root@virt-hv009:~# virsh domjobinfo 6160 Job type: Unbounded Time elapsed: 32331 ms Data processed: 25.475 GiB Data remaining: 25.034 GiB Data total: 50.509 GiB Memory processed: 0.000 B Memory remaining: 520.820 MiB Memory total: 520.820 MiB File processed: 25.475 GiB File remaining: 24.525 GiB File total: 50.000 GiB Constant pages: 0 Normal pages: 0 Normal data: 0.000 B Expected downtime: 300 ms Setup time: 6 ms root@virt-hv009:~# virsh domjobinfo 6160 Job type: Unbounded Time elapsed: 49543 ms Data processed: 50.000 GiB Data remaining: 520.820 MiB Data total: 50.509 GiB Memory processed: 0.000 B Memory remaining: 520.820 MiB Memory total: 520.820 MiB File processed: 50.000 GiB File remaining: 0.000 B File total: 50.000 GiB Constant pages: 0 Normal pages: 0 Normal data: 0.000 B Expected downtime: 300 ms Setup time: 6 ms ^^^^ Here after 49 sec, the disk has been process, but the memory hasn't started I skip the logs, but below is the output 5minutes & 56 seconds later, still nothing: root@virt-hv009:~# virsh domjobinfo 6160 Job type: Unbounded Time elapsed: 356919 ms Data processed: 50.000 GiB Data remaining: 520.820 MiB Data total: 50.509 GiB Memory processed: 0.000 B Memory remaining: 520.820 MiB Memory total: 520.820 MiB File processed: 50.000 GiB File remaining: 0.000 B File total: 50.000 GiB Constant pages: 0 Normal pages: 0 Normal data: 0.000 B Expected downtime: 300 ms Setup time: 6 ms Just after it started to move the memory, but 6 minutes later! root@virt-hv009:~# virsh domjobinfo 6160 Job type: Unbounded Time elapsed: 360092 ms Data processed: 50.052 GiB Data remaining: 453.895 MiB Data total: 50.509 GiB Memory processed: 53.224 MiB Memory remaining: 453.895 MiB Memory total: 520.820 MiB Memory bandwidth: 98.172 MiB/s File processed: 50.000 GiB File remaining: 0.000 B File total: 50.000 GiB Constant pages: 3541 Normal pages: 13591 Normal data: 53.090 MiB Expected downtime: 300 ms Setup time: 6 ms root@virt-hv009:~# virsh domjobinfo 6160 error: failed to get domain '6160' error: Domain not found: no domain with matching name '6160' Here the migration is done. But what made it wait more than 5 minutes to start moving the memory? Does anyone knows? Or have any idea what could produce this latency. - Marco

On Wed, Jun 01, 2016 at 11:59:29 +0200, Marc-Aurèle Brothier - Exoscale wrote:
Hi,
I'm facing a strange issue while doing a migration from an hypervisor to another one. The migration takes for ever to start moving the memory. The VM had no workload what so ever, just a basic ubuntu image. The versions on the hypervisors are: libvirt 1.2.21, qemu 1.2.3
Command to launche the migration: virsh migrate --verbose --live --abort-on-error --tunnelled --p2p --auto-converge --copy-storage-inc --xml vm-6160.xml 6160 qemu+tls://<destination_hypervisor>/system
You are copying storage too. It takes 5 minutes to copy the storage. The memory migration starts after the storage migration converges.

On Wed, Jun 01, 2016 at 03:55:37PM +0200, Peter Krempa wrote:
On Wed, Jun 01, 2016 at 11:59:29 +0200, Marc-Aurèle Brothier - Exoscale wrote:
Hi,
I'm facing a strange issue while doing a migration from an hypervisor to another one. The migration takes for ever to start moving the memory. The VM had no workload what so ever, just a basic ubuntu image. The versions on the hypervisors are: libvirt 1.2.21, qemu 1.2.3
Command to launche the migration: virsh migrate --verbose --live --abort-on-error --tunnelled --p2p --auto-converge --copy-storage-inc --xml vm-6160.xml 6160 qemu+tls://<destination_hypervisor>/system
You are copying storage too. It takes 5 minutes to copy the storage. The memory migration starts after the storage migration converges.
I don't think that's it - if you look at the logs provided, you can see that the storage was apparently fully copied after 49 seconds. There was then 5 minutes where neither the disk or memory processed numbers increased, before memory copying started. So there's something fishy going on there, whether just bogus stats reporting by qemu or a genuine delay/hang somewhere Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 01 Jun 2016, at 15:59, Daniel P. Berrange <berrange@redhat.com> wrote:
On Wed, Jun 01, 2016 at 03:55:37PM +0200, Peter Krempa wrote:
On Wed, Jun 01, 2016 at 11:59:29 +0200, Marc-Aurèle Brothier - Exoscale wrote:
Hi,
I'm facing a strange issue while doing a migration from an hypervisor to another one. The migration takes for ever to start moving the memory. The VM had no workload what so ever, just a basic ubuntu image. The versions on the hypervisors are: libvirt 1.2.21, qemu 1.2.3
Command to launche the migration: virsh migrate --verbose --live --abort-on-error --tunnelled --p2p --auto-converge --copy-storage-inc --xml vm-6160.xml 6160 qemu+tls://<destination_hypervisor>/system
You are copying storage too. It takes 5 minutes to copy the storage. The memory migration starts after the storage migration converges.
I don't think that's it - if you look at the logs provided, you can see that the storage was apparently fully copied after 49 seconds. There was then 5 minutes where neither the disk or memory processed numbers increased, before memory copying started. So there's something fishy going on there, whether just bogus stats reporting by qemu or a genuine delay/hang somewhere
That's correct, the disk was copied pretty quickly and I could see it at the destination growing during those 49 seconds. What would you do to try to figure out what's going on? Correction, we are using Qemu 2.3 (not 1.2.3, the Ubuntu syntaxe confised me with 1:2.3)

On Thu, Jun 02, 2016 at 02:32:47PM +0200, Marc-Aurèle Brothier - Exoscale wrote:
On 01 Jun 2016, at 15:59, Daniel P. Berrange <berrange@redhat.com> wrote:
On Wed, Jun 01, 2016 at 03:55:37PM +0200, Peter Krempa wrote:
On Wed, Jun 01, 2016 at 11:59:29 +0200, Marc-Aurèle Brothier - Exoscale wrote:
Hi,
I'm facing a strange issue while doing a migration from an hypervisor to another one. The migration takes for ever to start moving the memory. The VM had no workload what so ever, just a basic ubuntu image. The versions on the hypervisors are: libvirt 1.2.21, qemu 1.2.3
Command to launche the migration: virsh migrate --verbose --live --abort-on-error --tunnelled --p2p --auto-converge --copy-storage-inc --xml vm-6160.xml 6160 qemu+tls://<destination_hypervisor>/system
You are copying storage too. It takes 5 minutes to copy the storage. The memory migration starts after the storage migration converges.
I don't think that's it - if you look at the logs provided, you can see that the storage was apparently fully copied after 49 seconds. There was then 5 minutes where neither the disk or memory processed numbers increased, before memory copying started. So there's something fishy going on there, whether just bogus stats reporting by qemu or a genuine delay/hang somewhere
That's correct, the disk was copied pretty quickly and I could see it at the destination growing during those 49 seconds. What would you do to try to figure out what's going on?
Could try turning on debugging for libvirt QEMU driver, so we can see what QMP monitor traffic is going back & forth -might be something we can spot there that isn't visible in the API stats. eg log_filters="1:qemu" in the libvirtd.conf file and restart Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Marc-Aurèle Brothier - Exoscale
-
Peter Krempa