[libvirt] Crash state and QEMU

Hi, As far as I can tell, if QEMU exits abruptly or with a non-zero status code, libvirt treats this as a domain destruction given no real indication to the user that something bad happened. But libvirt does have a crashed state for domains, it's just not used for QEMU guests. I was wondering how intention of a design decision this was. Right now there's no good way for a management tool to detect a crashed guest/QEMU. Is there something I'm overlooking? Regards, Anthony Liguori

On Tue, May 10, 2011 at 01:26:39PM -0500, Anthony Liguori wrote:
As far as I can tell, if QEMU exits abruptly or with a non-zero status code, libvirt treats this as a domain destruction given no real indication to the user that something bad happened.
libvirtd raises an event. There is (was?) a "reason" argument (eg. "reason" == "watchdog fired"). I've a vague recollection this was discussed but never added. I can't find it in the code right now, but I might be looking for the wrong thing ...
But libvirt does have a crashed state for domains, it's just not used for QEMU guests.
I'll just make a historical note that the crashed state corresponded to a state in Xen. Essentially the states in libvirt are directly mapped to the ones listed in the Xen xm man page here: http://linux.die.net/man/1/xm
I was wondering how intention of a design decision this was. Right now there's no good way for a management tool to detect a crashed guest/QEMU. Is there something I'm overlooking?
Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

On 05/10/2011 02:04 PM, Richard W.M. Jones wrote:
On Tue, May 10, 2011 at 01:26:39PM -0500, Anthony Liguori wrote:
As far as I can tell, if QEMU exits abruptly or with a non-zero status code, libvirt treats this as a domain destruction given no real indication to the user that something bad happened.
libvirtd raises an event.
But that event is no different than the event fired for a normal guest shutdown, no?
There is (was?) a "reason" argument (eg. "reason" == "watchdog fired"). I've a vague recollection this was discussed but never added. I can't find it in the code right now, but I might be looking for the wrong thing ...
But libvirt does have a crashed state for domains, it's just not used for QEMU guests.
I'll just make a historical note that the crashed state corresponded to a state in Xen. Essentially the states in libvirt are directly mapped to the ones listed in the Xen xm man page here:
I'm well aware of that :-) That's why I asked whether not using the crash state was intentional (if it's deprecated as a general API). Regards, Anthony Liguori
I was wondering how intention of a design decision this was. Right now there's no good way for a management tool to detect a crashed guest/QEMU. Is there something I'm overlooking?
Rich.

On Tue, May 10, 2011 at 14:28:20 -0500, Anthony Liguori wrote:
On 05/10/2011 02:04 PM, Richard W.M. Jones wrote:
On Tue, May 10, 2011 at 01:26:39PM -0500, Anthony Liguori wrote:
As far as I can tell, if QEMU exits abruptly or with a non-zero status code, libvirt treats this as a domain destruction given no real indication to the user that something bad happened.
libvirtd raises an event.
But that event is no different than the event fired for a normal guest shutdown, no?
It is different, after normal shutdown libvirt issues (VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN) event while crashed qemu results in (VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_FAILED) event. Moreover, once "Introduce virDomainGetState API" patch set makes it into libvirt, there will be a way to detect crashed domain offline without listening to events, e.g.: # virsh domstate --reason $DOM shut off (crashed) However, shut off reason is not maintained across libvirtd restarts so after restarting libvirtd the reason will be unknown.
But libvirt does have a crashed state for domains, it's just not used for QEMU guests.
I'll just make a historical note that the crashed state corresponded to a state in Xen. Essentially the states in libvirt are directly mapped to the ones listed in the Xen xm man page here:
I'm well aware of that :-)
That's why I asked whether not using the crash state was intentional (if it's deprecated as a general API).
IIRC, the crashed state as used in Xen is similar to zombie processes. That is, crashed domain still exists in the system and one needs to get rid of it using virsh destroy when it's no longer needed. We don't have anything like that for qemu, if it crashes, it just disappears so the state is shown as shutoff. Jirka

On Wed, May 11, 2011 at 09:48:24AM +0200, Jiri Denemark wrote:
On Tue, May 10, 2011 at 14:28:20 -0500, Anthony Liguori wrote:
On 05/10/2011 02:04 PM, Richard W.M. Jones wrote:
On Tue, May 10, 2011 at 01:26:39PM -0500, Anthony Liguori wrote:
As far as I can tell, if QEMU exits abruptly or with a non-zero status code, libvirt treats this as a domain destruction given no real indication to the user that something bad happened.
libvirtd raises an event.
But that event is no different than the event fired for a normal guest shutdown, no?
It is different, after normal shutdown libvirt issues (VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN) event while crashed qemu results in (VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_FAILED) event.
Moreover, once "Introduce virDomainGetState API" patch set makes it into libvirt, there will be a way to detect crashed domain offline without listening to events, e.g.:
# virsh domstate --reason $DOM shut off (crashed)
However, shut off reason is not maintained across libvirtd restarts so after restarting libvirtd the reason will be unknown.
But libvirt does have a crashed state for domains, it's just not used for QEMU guests.
I'll just make a historical note that the crashed state corresponded to a state in Xen. Essentially the states in libvirt are directly mapped to the ones listed in the Xen xm man page here:
I'm well aware of that :-)
That's why I asked whether not using the crash state was intentional (if it's deprecated as a general API).
IIRC, the crashed state as used in Xen is similar to zombie processes. That is, crashed domain still exists in the system and one needs to get rid of it using virsh destroy when it's no longer needed. We don't have anything like that for qemu, if it crashes, it just disappears so the state is shown as shutoff.
The only way I could imagine getting equivalent to Xen would be a sick gross hack like... - QEMU installs SIG handlers for SEGV, BUS & ABORT - The handlers simply do kill(0, STOP). - libvirt gets some kind of event notification (unclear how) and marks guest as crashed. Pretty much the only time this is useful is for developers who want to catch the crash and attach to the QEMU process with GDB, todo something they couldn't do with a plain core dump. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (4)
-
Anthony Liguori
-
Daniel P. Berrange
-
Jiri Denemark
-
Richard W.M. Jones