[libvirt-users] libvirt unavailable while a VM is in migration?

Hi, I am running libvirt 0.8.6 on qemu (kvm, really) 0.12.5. I have noticed that while a live migration is running, I cannot do anything else with libvirt -- even 'virsh list' blocks without output until the migration is almost done. (At that point 'virsh list' will dump a final screen showing the VM I just migrated as 'running'; the next run of 'virsh list' no longer displays the VM -- this is why I say "almost done".) Is there anything I can do to prevent this? Is there a fix for this coming? MORE DETAILS: I open a connection to a remote libvirt instance and begin a live-migration using: domain.migrate(remoteCon, flags, None, None, 0) My flags are: flags = libvirt.VIR_MIGRATE_PEER2PEER | libvirt.VIR_MIGRATE_LIVE | libvirt.VIR_MIGRATE_UNDEFINE_SOURCE | libvirt.VIR_MIGRATE_TUNNELLED Thanks, --Igor

Did a bit more testing. I open a connection to the hypervisor from python using: con = libvirt.open('qemu:///system') No call that I tried on con returns while a migration is in progress. I tried: listDomainsID, lookupByID, getInfo, getHostname, getType. This sort of combines with my other outstanding email, about how to use migrateSetMaxDowntime. If I can't even get a domain object for a migrating domain, how could I possibly set max downtime AFTER the migration has begun? I must be missing something crucial. Can someone please give me some wisdom? --Igor On Fri, Dec 17, 2010 at 02:06:27AM -0600, Igor Serebryany wrote:
Hi,
I am running libvirt 0.8.6 on qemu (kvm, really) 0.12.5. I have noticed that while a live migration is running, I cannot do anything else with libvirt -- even 'virsh list' blocks without output until the migration is almost done.
(At that point 'virsh list' will dump a final screen showing the VM I just migrated as 'running'; the next run of 'virsh list' no longer displays the VM -- this is why I say "almost done".)
Is there anything I can do to prevent this? Is there a fix for this coming?
MORE DETAILS:
I open a connection to a remote libvirt instance and begin a live-migration using: domain.migrate(remoteCon, flags, None, None, 0)
My flags are: flags = libvirt.VIR_MIGRATE_PEER2PEER | libvirt.VIR_MIGRATE_LIVE | libvirt.VIR_MIGRATE_UNDEFINE_SOURCE | libvirt.VIR_MIGRATE_TUNNELLED
Thanks, --Igor
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users

On Fri, Dec 17, 2010 at 07:00:24AM -0600, Igor Serebryany wrote:
This sort of combines with my other outstanding email, about how to use migrateSetMaxDowntime. If I can't even get a domain object for a migrating domain, how could I possibly set max downtime AFTER the migration has begun?
I did some MORE testing. I obtain a domain object to a domain in two separate python processes. I begin migrating the domain in one of these processes, and then I call migrateSetMaxDowntime() on the domain in the other process. Unsurprisingly, the call is blocked and does not return until the migration finished. At that point it returned 0, but I have no idea whether the call had any impact on the running migration. Even more confused about how this is supposed to work now, --Igor

I did some MORE testing. I obtain a domain object to a domain in two separate python processes. I begin migrating the domain in one of these processes, and then I call migrateSetMaxDowntime() on the domain in the other process. Unsurprisingly, the call is blocked and does not return until the migration finished. At that point it returned 0, but I have no idea whether the call had any impact on the running migration.
Even more confused about how this is supposed to work now,
Hmm, the behavior you get is really weird. Most of libvirt APIs are synchronous so they only return after the operation has finished. If you need to do something else while the operation is running, you need a separate connection to libvirt (opened in a separate process or thread) which you can use for manipulating with other domains or, to some extent, even with the same domain. In the migration case, you can use the second connection to monitor migration progress (virDomainGetJobInfo), cancel the migration (virDomainAbortJob), turn live migration into non-live migration (virDomainSuspend), or setting maximum downtime (virDomainMigrateSetMaxDowntime). It's quite strange, that as you say you have two python processes and virDomainMigrateSetMaxDowntime doesn't finish until the migration itself finishes. I would suggest to start with a simple setup and try with virsh migrate and virsh migrate-setmaxdowntime started from separate terminals, but in your first email, you said you can't even run virsh list while migration is running, right? So could you make sure you have debug info libvirt packages installed or in case you compile libvirt yourself that you compile with -g (should be there by default) and don't strip binaries. Then try to get into the state, when a migration is running and virsh list (or virsh migrate-setmaxdowntime) from another terminal is blocked and attach gdb to libvirtd gdb -p $(pidof libvirtd) and type "thread apply all bt" in the gdb and send us the result. I'm curious why it doesn't work for you. Jirka

On Mon, Dec 20, 2010 at 10:37:56AM +0100, Jiri Denemark wrote:
in your first email, you said you can't even run virsh list while migration is running, right?
yup -- when i start my migration using my python application, 'virsh list' no longer works from the command-line. i will try setting up a simple test-case i can use with 'virsh migrate' and let you know the results.
So could you make sure you have debug info libvirt packages installed or in case you compile libvirt yourself that you compile with -g (should be there by default) and don't strip binaries.
I am using the rpm packages distributed from the website, which I've installed on my debian box using alien. I've attached two gdb traces -- one for when a migration is just running, and another where I've also got a blocked 'virsh list' from the command line. It appears there's no difference between them (except in the data counts in the thread actually doing the migration). I've also attached a trace for the 'virsh' process itself. Thanks for answering, hopefully we can get to the bottom of this. --Igor

So could you make sure you have debug info libvirt packages installed or in case you compile libvirt yourself that you compile with -g (should be there by default) and don't strip binaries.
I am using the rpm packages distributed from the website, which I've installed on my debian box using alien.
Ouch, I wonder if that could be the reason... Could you just compile libvirt yourself? Personally, I wouldn't really trust such transformed package esp. considering the hacky steps you did and described in another email.
I've attached two gdb traces -- one for when a migration is just running, and another where I've also got a blocked 'virsh list' from the command line. It appears there's no difference between them (except in the data counts in the thread actually doing the migration).
I've also attached a trace for the 'virsh' process itself.
Thanks, that was a good idea since according to it virsh is stuck while it tries to authenticate to libvirtd. Could you run this second virsh list with LIBVIRT_DEBUG environment variable set to 1 and attach the debug output? Also please check the virsh is stuck in the same place. Hopefully the log will tell us more. Jirka

On Mon, Dec 20, 2010 at 02:14:42PM +0100, Jiri Denemark wrote:
Ouch, I wonder if that could be the reason... Could you just compile libvirt yourself?
I still had a box around where I was using a hand-compiled libvirt with the following version output: Compiled against library: libvir 0.8.5 Using library: libvir 0.8.5 Using API: QEMU 0.8.5 Running hypervisor: QEMU 0.12.5 I experimented on there, and found the same problem. What's more, the 0.8.5 version appears to have 'virsh list' freeze at the same spot as in the traceback I gave you for 0.8.6. I'm assuming therefore that the package conversion is not the issue here; I'd be happy to set up a hand-compiled 0.8.6 but that would take a little more time.
tries to authenticate to libvirtd. Could you run this second virsh list with LIBVIRT_DEBUG environment variable set to 1 and attach the debug output? Also please check the virsh is stuck in the same place. Hopefully the log will tell us more.
I have attached two 'virsh list' logs -- one where I just did 'virsh list' and one where I explicitly specified the URI ( 'virsh -c "qemu:///system" list'). Both appear to be stuck at the same spot. This spot is also the same on libvirt .8.5.0, which has generally identical output. Let me know what you think, and thanks again for helping me with this. --Igor
participants (2)
-
Igor Serebryany
-
Jiri Denemark