The panic device is currently documented as a way for "libvirt to receive
panic notification from a QEMU guest".
This is true, but not the whole story. When a guest triggers the panic
device, QEMU pauses the guest, and libvirt takes the action specified by
on_crash. This can interfere with the guest's own crash handling actions
(e.g. writing a dump file and rebooting itself) if the guest triggers the
panic device first (as Windows does).
None of this is an obvious side effect of a notification mechanism, so the
panic device documentation should mention it. (I'll send a documentation
patch shortly.)
Nor is this a desirable side effect, for guests that are configured to deal
with crashes themselves. Sure, you can avoid using the panic device with
such guests, but then virsh list or another application using the libvirt
API to monitor domain state won't notice guest crashes. And if you still
want libvirt to take action on guests that don't do it themselves, then you
have to be careful to include the panic device only for those domains.
Ideally libvirt would offer (1) a state indicating "this guest crashed and
needs help" independently of triggering an action, and (2) a way to trigger
an action only when needed to recover from the crash, excluding guests that
deal with their own crashes.
Sadly pvpanic and the HyperV crash MSR convey only that the guest crashed,
not whether the guest is configured to take some action on its own. So
there's no way to know precisely that a crashed (and not paused) guest is
in need of assistance.
But a state indicating "this guest crashed N minutes ago and hasn't
rebooted itself" would be a useful approximation. And triggering an action
N minutes after a guest crash if it hasn't rebooted itself in the meantime
would make it easy to cap the downtime of crashed domains. Both could be
implemented without changing either QEMU or panic device semantics.
Does this seem useful to anyone else?
--Ed