
On Wed, Sep 14, 2011 at 02:20:46PM +0200, Thomas Treutner wrote:
Hi,
this is kind of a follow-up to an older question/discussion: https://www.redhat.com/archives/libvir-list/2010-July/msg00267.html
As a result of that, I use a second thread for monitoring the live migration, taking actions (setting maxdowntime to a value that fits the situation) if necessary.
Although I call getJobInfo() with a quite low frequency (once a second), problems are occuring frequently, like every 10th or 15th live migration. Problems range from exceptions that the domain is not running anymore to complete JVM crashes -> http://pastebin.com/jT6sXubu Recovery from exceptions doesn't seem to work perfectly, as they seem to trigger that connections to a host can't be shut down properly because there are still open references.
Of course, in my monitoring thread I'm checking in every monitoring iteration if the domain object is not null, is still active, if the jobInfo is available yet etc. But, as I can not synchronize with vm.migrate(), there still a reasonable chance that migrate() just invalidates the current domain while I'm accessing it, no matter what I do.
At the C level every API in libvirt is threadsafe. The only key point is that if you use objects (eg virDomainPtr) from multiple threads you ought to hold an extra reference on them (virDomainRef) per thread to ensure that one thread does not delete an object that is in use by the other thread. At the Java level, this reference handling ought to be working automatically so you wouldn't need todo anything special to safely do migration with 2 threads as you describe. So I don't really have any explanation for what you see. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|