Hi,
I hope someone can help me out.
I'm running into an issue with libvirt 1.2.12 reporting "operation failed:
domain is no longer running" for a migration when qemu thinks it was fine.
The steps are:
1) create guest with stress test running in it to dirty memory at a high rate
(fast enough that it would not normally complete live-migration)
2) trigger live migration with dom.migrateToURI2()
3) while migration is in progress, call dom.suspend() on the migrating domain.
What I see at this point is the following:
a) At time 50.465 the monitoring code sees a VIR_DOMAIN_EVENT_SUSPENDED event,
as expected.
b) An instrumented qemu logs the following:
51.143: done transferring state
51.143: done migration
51.144: qmp_query_migrate reporting state completed
c) At time 51.468 the monitoring code sees a VIR_DOMAIN_EVENT_RESUMED event,
with detail of VIR_DOMAIN_EVENT_RESUMED_UNPAUSED
c) At time 51.469 the the monitoring code sees a VIR_DOMAIN_EVENT_RESUMED event,
with detail of VIR_DOMAIN_EVENT_RESUMED_MIGRATED
e) At time 51.471 the dom.migrateToURI2() call raises an exception (this is
python). The corresponding libvirt log file shows:
"error : virNetClientProgramDispatchError:177 : operation failed: domain is no
longer running"
For what it's worth, the problem seems to be fixed in libvirt 1.2.17. In that
version and later I don't see the VIR_DOMAIN_EVENT_RESUMED event, the migration
just completes.
I'm looking at the libvirt history, but I figured I'd ask here too...
Thanks,
Chris