Hi,
this is kind of a follow-up to an older question/discussion:
https://www.redhat.com/archives/libvir-list/2010-July/msg00267.html
As a result of that, I use a second thread for monitoring the live
migration, taking actions (setting maxdowntime to a value that fits the
situation) if necessary.
Although I call getJobInfo() with a quite low frequency (once a second),
problems are occuring frequently, like every 10th or 15th live
migration. Problems range from exceptions that the domain is not running
anymore to complete JVM crashes ->
http://pastebin.com/jT6sXubu
Recovery from exceptions doesn't seem to work perfectly, as they seem to
trigger that connections to a host can't be shut down properly because
there are still open references.
Of course, in my monitoring thread I'm checking in every monitoring
iteration if the domain object is not null, is still active, if the
jobInfo is available yet etc. But, as I can not synchronize with
vm.migrate(), there still a reasonable chance that migrate() just
invalidates the current domain while I'm accessing it, no matter what I do.
Do I miss something or is that correct? Any ideas how to reliably solve
it? Is there some experience from virt-manager, where (in my quite old
version) I assume at least the domain is read for cpu etc. stats while
live migrating...
regards,
thomas