Re: [libvirt] [RFC PATCH] Use PAUSED state for domains that are starting up

20 Feb 2015


      On Thu, Feb 19, 2015 at 05:07:45PM +0100, Jiri Denemark wrote:
...
On Mon, Feb 16, 2015 at 15:07:19 +0000, Daniel P. Berrange wrote:
...
On Mon, Feb 16, 2015 at 04:03:50PM +0100, Jiri Denemark wrote:
...
On Mon, Feb 16, 2015 at 14:57:17 +0000, Daniel P. Berrange wrote:
...
On Mon, Feb 16, 2015 at 03:50:41PM +0100, Jiri Denemark wrote:
...
When libvirt is starting a domain, it reports the state as SHUTOFF until
it's RUNNING. This is not ideal because domain startup may take a long
time (usually because of some configuration issues, firewalls blocking
access to network disks, etc.) and domain lists provided by libvirt look
awkward. One can see weird shutoff domains with IDs in a list of active
domains or even shutoff transient domains. In any case, it looks more
like a bug in libvirt than a normal state a domain goes through.
A shutoff transient domain isn't too bad IMHO, but a shutoff domain
with an ID number is definitely not expected.
Could we perhaps address it by ensuring that we always return '-1'
for ID if the state is "SHUTOFF", even if def->id has a positive
value ?
But we should somehow make it clear that the domain is actually there,
somehow, only not completely usable. That is, one may need to actually
call virsh destroy on such domain to get rid of the leftover process if
something goes wrong.
Hmm, if something goes wrong due virDomainStart though, we should be
tearing down the QEMU process. IIRC we should even be kill -9'ing QEMU,
so even if QEMU is stuck in an uninterruptable sleep and won't exit,
once the (storage?) problem causing that sleep is resolved QEMU will
exit without further intervention. Similarly calling 'destroy' more
times won't make it any more likely to quit, once it has had a SIGKILL
You're right of course. However, I still feel we should distinguish
shutoff domain from a domain that is being started. Considering it
shutoff until we have a monitor connection may cause all sorts of
confusion. Except for shutoff transient domains, one can see a shutoff
domain that cannot be started because it is already running (or perhaps
because acquiring a job fails), it's impossible to distinguish between a
domain which was running previously and wasn't cleaned up for whatever
reason (bug in libvirt most likely) from a normal state when libvirt is
waiting for a monitor to show up...
It kind of feels like it merits a new state, but I fear that would cause
more problems for existing apps which won't be expecting it. So perhaps
using 'paused' during startup is the least worst option ?

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|