
On Thu, Feb 19, 2015 at 05:07:45PM +0100, Jiri Denemark wrote:
On Mon, Feb 16, 2015 at 15:07:19 +0000, Daniel P. Berrange wrote:
On Mon, Feb 16, 2015 at 04:03:50PM +0100, Jiri Denemark wrote:
On Mon, Feb 16, 2015 at 14:57:17 +0000, Daniel P. Berrange wrote:
On Mon, Feb 16, 2015 at 03:50:41PM +0100, Jiri Denemark wrote:
When libvirt is starting a domain, it reports the state as SHUTOFF until it's RUNNING. This is not ideal because domain startup may take a long time (usually because of some configuration issues, firewalls blocking access to network disks, etc.) and domain lists provided by libvirt look awkward. One can see weird shutoff domains with IDs in a list of active domains or even shutoff transient domains. In any case, it looks more like a bug in libvirt than a normal state a domain goes through.
A shutoff transient domain isn't too bad IMHO, but a shutoff domain with an ID number is definitely not expected.
Could we perhaps address it by ensuring that we always return '-1' for ID if the state is "SHUTOFF", even if def->id has a positive value ?
But we should somehow make it clear that the domain is actually there, somehow, only not completely usable. That is, one may need to actually call virsh destroy on such domain to get rid of the leftover process if something goes wrong.
Hmm, if something goes wrong due virDomainStart though, we should be tearing down the QEMU process. IIRC we should even be kill -9'ing QEMU, so even if QEMU is stuck in an uninterruptable sleep and won't exit, once the (storage?) problem causing that sleep is resolved QEMU will exit without further intervention. Similarly calling 'destroy' more times won't make it any more likely to quit, once it has had a SIGKILL
You're right of course. However, I still feel we should distinguish shutoff domain from a domain that is being started. Considering it shutoff until we have a monitor connection may cause all sorts of confusion. Except for shutoff transient domains, one can see a shutoff domain that cannot be started because it is already running (or perhaps because acquiring a job fails), it's impossible to distinguish between a domain which was running previously and wasn't cleaned up for whatever reason (bug in libvirt most likely) from a normal state when libvirt is waiting for a monitor to show up...
It kind of feels like it merits a new state, but I fear that would cause more problems for existing apps which won't be expecting it. So perhaps using 'paused' during startup is the least worst option ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|