[libvirt-users] migration job: unexpectedly failed

Hi, I am trying to run some test and analysis while performing a wide live migration of a VM between two different network location. I use libvirt and qemu-kvm as hypervisor on Linux. The live wide migration from A to B completes successfully, instead I can't achieve the reverse path, from B to A (and that is right what I am interested to). To perform the migration I run this command in virsh: *"migrate --live --verbose uno qemu+ssh://root@ip.address/system"* * * Even if dramatically slow (i guess due to some network bottleneck) the migration starts correctly, but it gets stuck at 96% and after some minute it returns this error: *"Migration: [ 96 %]error: operation failed: migration job: unexpectedly failed" *without further details. The libvirtd.log file in the source host says: *2013-05-07 11:01:18.739+0000: 9538: error : qemuMigrationUpdateJobStatus:945 : operation failed: migration job: unexpectedly failed* * * Apparently i can't find any clue of what is causing the error, do you have any idea/solution? * * (I'm not sure if this is the appropriate mailing-list, maybe it was better the devs list?) Thanks for your help Daniele

On 05/07/2013 05:08 AM, Daniele wrote:
Hi, I am trying to run some test and analysis while performing a wide live migration of a VM between two different network location. I use libvirt and qemu-kvm as hypervisor on Linux.
The live wide migration from A to B completes successfully, instead I can't achieve the reverse path, from B to A (and that is right what I am interested to). To perform the migration I run this command in virsh: *"migrate --live --verbose uno qemu+ssh://root@ip.address/system"*
What version of libvirtd are you running on both the source and destination? There is a known nasty bug in a few versions prior to 1.0.5 where migration could trigger a race that would kill the source libvirtd, so if you aren't testing with the latest version on both ends, then upgrade first. Also, what version of qemu are you running on the two ends?
* * Even if dramatically slow (i guess due to some network bottleneck) the migration starts correctly, but it gets stuck at 96% and after some minute it returns this error: *"Migration: [ 96 %]error: operation failed: migration job: unexpectedly failed" *without further details.
The libvirtd.log file in the source host says: *2013-05-07 11:01:18.739+0000: 9538: error : qemuMigrationUpdateJobStatus:945 : operation failed: migration job: unexpectedly failed*
Does the /var/log/libvirt/qemu/uno.log file on either the source or destination shed more light?
* * Apparently i can't find any clue of what is causing the error, do you have any idea/solution? * * (I'm not sure if this is the appropriate mailing-list, maybe it was better the devs list?)
This list is fine. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Thanks for your answer 2013/5/7 Eric Blake <eblake@redhat.com>
Hi, I am trying to run some test and analysis while performing a wide
On 05/07/2013 05:08 AM, Daniele wrote: live
migration of a VM between two different network location. I use libvirt and qemu-kvm as hypervisor on Linux.
The live wide migration from A to B completes successfully, instead I can't achieve the reverse path, from B to A (and that is right what I am interested to). To perform the migration I run this command in virsh: *"migrate --live --verbose uno qemu+ssh://root@ip.address/system"*
What version of libvirtd are you running on both the source and destination? There is a known nasty bug in a few versions prior to 1.0.5 where migration could trigger a race that would kill the source libvirtd, so if you aren't testing with the latest version on both ends, then upgrade first. Also, what version of qemu are you running on the two ends?
In the source host there is libvirtd 0.9.12, while in the destination host there is a modified version of the 0.9.8 (that I can't change). Now that you mention it, sometimes after the migration failure the libvirt daemon crashed in the destination host, not in the source host. About the qemu version, the source host has qemu-kvm-1.1.2+dfsg-6, while the destination host has qemu-kvm-1.0.
* * Even if dramatically slow (i guess due to some network bottleneck) the migration starts correctly, but it gets stuck at 96% and after some minute it returns this error: *"Migration: [ 96 %]error: operation failed: migration job: unexpectedly failed" *without further details.
The libvirtd.log file in the source host says: *2013-05-07 11:01:18.739+0000: 9538: error : qemuMigrationUpdateJobStatus:945 : operation failed: migration job: unexpectedly failed*
Does the /var/log/libvirt/qemu/uno.log file on either the source or destination shed more light?
Nothing interesting in the source uno.log, but I wasn't checking in the destination host log. These are its last lines: *savevm: unsupported version 3 for 'i8254' v2* *load of migration failed* *2013-05-07 16:27:59.682+0000: shutting down* I'm also checking the libvirtd.log in the destination host (that i didn't look before) and it seems interesting. It reports this error: *2013-05-07 16:27:59.682+0000: 16651: error : qemuMonitorIO:560 : internal error End of file from monitor* *Caught Segmentation violation dumping internal log buffer:* * * ..and following there is a really long log of the debugger with a time stamp of the last 13 seconds of the migration. You can see the whole log here: http://db.tt/LcXEvGjF Thanks again Daniele
* * Apparently i can't find any clue of what is causing the error, do you have any idea/solution? * * (I'm not sure if this is the appropriate mailing-list, maybe it was better the devs list?)
This list is fine.
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 05/07/2013 10:58 AM, Daniele wrote:
Thanks for your answer
The live wide migration from A to B completes successfully, instead I can't achieve the reverse path, from B to A (and that is right what I am interested to). To perform the migration I run this command in virsh: *"migrate --live --verbose uno qemu+ssh://root@ip.address/system"*
In the source host there is libvirtd 0.9.12, while in the destination host there is a modified version of the 0.9.8 (that I can't change).
There's your problem. In general, migration is backwards-compatible (old going to new should work; if it doesn't, that's a bug we are prepared to fix), but not forwards-compatible (there's no way we can ever commit to guaranteeing that new->old will work in all scenarios). Your best bet for successful migration is to have the same version of software on both sides of the equation, or to be prepared for only old->new to work. Your failure appears to be because you are trying new->old.
Nothing interesting in the source uno.log, but I wasn't checking in the destination host log. These are its last lines: *savevm: unsupported version 3 for 'i8254' v2* *load of migration failed* *2013-05-07 16:27:59.682+0000: shutting down*
I'm also checking the libvirtd.log in the destination host (that i didn't look before) and it seems interesting. It reports this error: *2013-05-07 16:27:59.682+0000: 16651: error : qemuMonitorIO:560 : internal error End of file from monitor* *Caught Segmentation violation dumping internal log buffer:*
Yep, when migrating from new to old qemu, the new qemu was sending stuff the old one choked on. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hi Eric, so I will try to downgrade libvirt in the source host back to the 0.9.8 version. To completely remove libvirt is enough to run the apt-get command with the purge option? Same thing also for the qemu packet? Thanks Daniele 2013/5/7 Eric Blake <eblake@redhat.com>
On 05/07/2013 10:58 AM, Daniele wrote:
Thanks for your answer
The live wide migration from A to B completes successfully, instead I can't achieve the reverse path, from B to A (and that is right what I am interested to). To perform the migration I run this command in virsh: *"migrate --live --verbose uno qemu+ssh://root@ip.address/system"*
In the source host there is libvirtd 0.9.12, while in the destination host there is a modified version of the 0.9.8 (that I can't change).
There's your problem. In general, migration is backwards-compatible (old going to new should work; if it doesn't, that's a bug we are prepared to fix), but not forwards-compatible (there's no way we can ever commit to guaranteeing that new->old will work in all scenarios). Your best bet for successful migration is to have the same version of software on both sides of the equation, or to be prepared for only old->new to work. Your failure appears to be because you are trying new->old.
Nothing interesting in the source uno.log, but I wasn't checking in the destination host log. These are its last lines: *savevm: unsupported version 3 for 'i8254' v2* *load of migration failed* *2013-05-07 16:27:59.682+0000: shutting down*
I'm also checking the libvirtd.log in the destination host (that i didn't look before) and it seems interesting. It reports this error: *2013-05-07 16:27:59.682+0000: 16651: error : qemuMonitorIO:560 : internal error End of file from monitor* *Caught Segmentation violation dumping internal log buffer:*
Yep, when migrating from new to old qemu, the new qemu was sending stuff the old one choked on.
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 05/08/2013 01:58 AM, Daniele wrote: [please don't top-post on technical lists]
Hi Eric, so I will try to downgrade libvirt in the source host back to the 0.9.8 version. To completely remove libvirt is enough to run the apt-get command with the purge option?
Unfortunately, I don't use debian enough to know how to downgrade. Also, 0.9.8 is rather old, you really ought to consider using newer libvirt, such as 1.0.5, just because of the improvements it has (I guess I don't understand why Debian favors such an old version of software). But if you DO downgrade, be aware that you will have to stop your guest and then restart it before your guest will be running under the older qemu. But if you are going to stop your guest, then you can get away with offline migration (much simpler, as it can work in spite of qemu version mismatch) - that is, live migration is only useful if you are trying to avoid powering down your guest. Conversely, if you are going to insist on live migration because your guest cannot suffer from downtime, then upgrading qemu is your only option. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

2013/5/8 Eric Blake <eblake@redhat.com>
On 05/08/2013 01:58 AM, Daniele wrote:
[please don't top-post on technical lists]
Hi Eric, so I will try to downgrade libvirt in the source host back to the 0.9.8 version. To completely remove libvirt is enough to run the apt-get command with the purge option?
Unfortunately, I don't use debian enough to know how to downgrade. Also, 0.9.8 is rather old, you really ought to consider using newer libvirt, such as 1.0.5, just because of the improvements it has (I guess I don't understand why Debian favors such an old version of software).
But if you DO downgrade, be aware that you will have to stop your guest and then restart it before your guest will be running under the older qemu. But if you are going to stop your guest, then you can get away with offline migration (much simpler, as it can work in spite of qemu version mismatch) - that is, live migration is only useful if you are trying to avoid powering down your guest. Conversely, if you are going to insist on live migration because your guest cannot suffer from downtime, then upgrading qemu is your only option.
I downgraded libvirt so now I have the same version both in the source and the destination host. Now this is the situation: in one direction (from B to A) there is exactly the same error as before!.. instead from A to B, the direction that before was working, now returns this error: "End of file while reading data: : Input/output error", libvirtd crashes at the destination host, and when i restart it the vm is running (so it runs both in the destination and in the source host simultaneously!).. weird!

On 05/08/2013 12:36 PM, Daniele wrote:
I downgraded libvirt so now I have the same version both in the source and the destination host. Now this is the situation: in one direction (from B to A) there is exactly the same error as before!.. instead from A to B, the direction that before was working, now returns this error: "End of file while reading data: : Input/output error", libvirtd crashes at the destination host, and when i restart it the vm is running (so it runs both in the destination and in the source host simultaneously!)..
That's bad, and probably leads to disk corruption from the guest's point of view. Older versions of libvirt did not handle migration failure as robustly as newer versions. I still don't understand why you are trying to go backwards in time to buggier builds, nor why you can't do an offline migration. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On May 8, 2013 9:14 PM, "Eric Blake" <eblake@redhat.com> wrote:
On 05/08/2013 12:36 PM, Daniele wrote:
I downgraded libvirt so now I have the same version both in the source
the destination host. Now this is the situation: in one direction (from B to A) there is exactly the same error as before!.. instead from A to B, the direction that before was working, now returns this error: "End of file while reading data: : Input/output error", libvirtd crashes at the destination host, and when i restart it the vm is running (so it runs both in the destination and in
and the
source host simultaneously!)..
That's bad, and probably leads to disk corruption from the guest's point of view. Older versions of libvirt did not handle migration failure as robustly as newer versions. I still don't understand why you are trying to go backwards in time to buggier builds, nor why you can't do an offline migration.
That's because we need to use that particular modified version of libvirt to implement an exchange of additional messages between the hypervisor and the network router. We are performing analysis and test to the live migration executed through a new protocol called LISP, to improve the network reachability of a VM after the migration. Since now the two versions of libvirt are the same, what could be the reason of that error? Could it be due to the different versions of KVM?
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 05/08/2013 01:23 PM, Daniele wrote:
Since now the two versions of libvirt are the same, what could be the reason of that error? Could it be due to the different versions of KVM?
Migration is a function of qemu. Libvirt just drives qemu. qemu supports migration from old->new, but if new->old fails, then you are on your own. So yes, make sure you have the same qemu on both ends, if you want to do round trip migrations. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Daniele
-
Eric Blake