[libvirt] Making panic great again

The panic device is currently documented as a way for "libvirt to receive panic notification from a QEMU guest". This is true, but not the whole story. When a guest triggers the panic device, QEMU pauses the guest, and libvirt takes the action specified by on_crash. This can interfere with the guest's own crash handling actions (e.g. writing a dump file and rebooting itself) if the guest triggers the panic device first (as Windows does). None of this is an obvious side effect of a notification mechanism, so the panic device documentation should mention it. (I'll send a documentation patch shortly.) Nor is this a desirable side effect, for guests that are configured to deal with crashes themselves. Sure, you can avoid using the panic device with such guests, but then virsh list or another application using the libvirt API to monitor domain state won't notice guest crashes. And if you still want libvirt to take action on guests that don't do it themselves, then you have to be careful to include the panic device only for those domains. Ideally libvirt would offer (1) a state indicating "this guest crashed and needs help" independently of triggering an action, and (2) a way to trigger an action only when needed to recover from the crash, excluding guests that deal with their own crashes. Sadly pvpanic and the HyperV crash MSR convey only that the guest crashed, not whether the guest is configured to take some action on its own. So there's no way to know precisely that a crashed (and not paused) guest is in need of assistance. But a state indicating "this guest crashed N minutes ago and hasn't rebooted itself" would be a useful approximation. And triggering an action N minutes after a guest crash if it hasn't rebooted itself in the meantime would make it easy to cap the downtime of crashed domains. Both could be implemented without changing either QEMU or panic device semantics. Does this seem useful to anyone else? --Ed

On Thu, Apr 27, 2017 at 05:34:21PM -0700, Ed Swierk wrote:
The panic device is currently documented as a way for "libvirt to receive panic notification from a QEMU guest".
This is true, but not the whole story. When a guest triggers the panic device, QEMU pauses the guest, and libvirt takes the action specified by on_crash. This can interfere with the guest's own crash handling actions (e.g. writing a dump file and rebooting itself) if the guest triggers the panic device first (as Windows does).
None of this is an obvious side effect of a notification mechanism, so the panic device documentation should mention it. (I'll send a documentation patch shortly.)
Nor is this a desirable side effect, for guests that are configured to deal with crashes themselves. Sure, you can avoid using the panic device with such guests, but then virsh list or another application using the libvirt API to monitor domain state won't notice guest crashes. And if you still want libvirt to take action on guests that don't do it themselves, then you have to be careful to include the panic device only for those domains.
Ideally libvirt would offer (1) a state indicating "this guest crashed and needs help" independently of triggering an action, and (2) a way to trigger an action only when needed to recover from the crash, excluding guests that deal with their own crashes.
Sadly pvpanic and the HyperV crash MSR convey only that the guest crashed, not whether the guest is configured to take some action on its own. So there's no way to know precisely that a crashed (and not paused) guest is in need of assistance.
But a state indicating "this guest crashed N minutes ago and hasn't rebooted itself" would be a useful approximation. And triggering an action N minutes after a guest crash if it hasn't rebooted itself in the meantime would make it easy to cap the downtime of crashed domains. Both could be implemented without changing either QEMU or panic device semantics.
Does this seem useful to anyone else?
I'm trying to understand the situation. So you have a guest that handles crashes itself (like kdump, let's say), but on_crash options are not enough for you: - preserve is bad because the guest is not available until someone restarts it - restart is bad because it doesn't keep the dump anywhere? - coredump-restart is bad because it doesn't keep the internal dump? I have no usage for this, currently, so I'm not the right one to discuss this, but I feel like you want the guest-handled crash to be uploaded or saved somewhere and then have libvirt just restart it. Or configure the guest not to handle crashes and set on_crash to coredump-restart. If none of those is working for you and you really need a special case, it is doable with a short script atop of libvirt.
--Ed
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, Apr 28, 2017 at 08:43:30AM +0200, Martin Kletzander wrote:
On Thu, Apr 27, 2017 at 05:34:21PM -0700, Ed Swierk wrote:
The panic device is currently documented as a way for "libvirt to receive panic notification from a QEMU guest".
This is true, but not the whole story. When a guest triggers the panic device, QEMU pauses the guest, and libvirt takes the action specified by on_crash. This can interfere with the guest's own crash handling actions (e.g. writing a dump file and rebooting itself) if the guest triggers the panic device first (as Windows does).
None of this is an obvious side effect of a notification mechanism, so the panic device documentation should mention it. (I'll send a documentation patch shortly.)
Nor is this a desirable side effect, for guests that are configured to deal with crashes themselves. Sure, you can avoid using the panic device with such guests, but then virsh list or another application using the libvirt API to monitor domain state won't notice guest crashes. And if you still want libvirt to take action on guests that don't do it themselves, then you have to be careful to include the panic device only for those domains.
Ideally libvirt would offer (1) a state indicating "this guest crashed and needs help" independently of triggering an action, and (2) a way to trigger an action only when needed to recover from the crash, excluding guests that deal with their own crashes.
Sadly pvpanic and the HyperV crash MSR convey only that the guest crashed, not whether the guest is configured to take some action on its own. So there's no way to know precisely that a crashed (and not paused) guest is in need of assistance.
But a state indicating "this guest crashed N minutes ago and hasn't rebooted itself" would be a useful approximation. And triggering an action N minutes after a guest crash if it hasn't rebooted itself in the meantime would make it easy to cap the downtime of crashed domains. Both could be implemented without changing either QEMU or panic device semantics.
Does this seem useful to anyone else?
I'm trying to understand the situation. So you have a guest that handles crashes itself (like kdump, let's say), but on_crash options are not enough for you:
- preserve is bad because the guest is not available until someone restarts it
- restart is bad because it doesn't keep the dump anywhere?
- coredump-restart is bad because it doesn't keep the internal dump?
I have no usage for this, currently, so I'm not the right one to discuss this, but I feel like you want the guest-handled crash to be uploaded or saved somewhere and then have libvirt just restart it. Or configure the guest not to handle crashes and set on_crash to coredump-restart.
With the watchdog we have a much wider set of actions that we can instruct QEMU to do: 'reset' — default, forcefully reset the guest 'shutdown' — gracefully shutdown the guest (not recommended) 'poweroff' — forcefully power off the guest 'pause' — pause the guest 'none' — do nothing 'dump' — automatically dump the guest Since 0.8.7 'inject-nmi' — inject a non-maskable interrupt into the guest Since 1.2.17 IMHO, we need an RFE against QEMU to allow pvpanic to have the same kind of configurability as watchdog. So instead of QEMU blindly pausing the guest, it can be told 'none' at which point libvirt can emit the event and the admin can decide what further action they wish to take, if any. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Fri, Apr 28, 2017 at 1:46 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
With the watchdog we have a much wider set of actions that we can instruct QEMU to do:
'reset' — default, forcefully reset the guest 'shutdown' — gracefully shutdown the guest (not recommended) 'poweroff' — forcefully power off the guest 'pause' — pause the guest 'none' — do nothing 'dump' — automatically dump the guest Since 0.8.7 'inject-nmi' — inject a non-maskable interrupt into the guest Since 1.2.17
IMHO, we need an RFE against QEMU to allow pvpanic to have the same kind of configurability as watchdog. So instead of QEMU blindly pausing the guest, it can be told 'none' at which point libvirt can emit the event and the admin can decide what further action they wish to take, if any.
I agree the panic device should support the same actions as the watchdog does. All of them could be implemented without changing QEMU, though. For example libvirt could implement the 'none' action by resuming the guest. That means the guest still gets paused briefly, but I'd view fixing this as more of a performance optimization than a prerequisite for supporting the full set of actions. --Ed

On Fri, Apr 28, 2017 at 11:28:47AM -0700, Ed Swierk wrote:
On Fri, Apr 28, 2017 at 1:46 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
With the watchdog we have a much wider set of actions that we can instruct QEMU to do:
'reset' — default, forcefully reset the guest 'shutdown' — gracefully shutdown the guest (not recommended) 'poweroff' — forcefully power off the guest 'pause' — pause the guest 'none' — do nothing 'dump' — automatically dump the guest Since 0.8.7 'inject-nmi' — inject a non-maskable interrupt into the guest Since 1.2.17
IMHO, we need an RFE against QEMU to allow pvpanic to have the same kind of configurability as watchdog. So instead of QEMU blindly pausing the guest, it can be told 'none' at which point libvirt can emit the event and the admin can decide what further action they wish to take, if any.
I agree the panic device should support the same actions as the watchdog does.
All of them could be implemented without changing QEMU, though. For example libvirt could implement the 'none' action by resuming the guest. That means the guest still gets paused briefly, but I'd view fixing this as more of a performance optimization than a prerequisite for supporting the full set of actions.
I'd prefer to have explicit QEMU support, because it is less chance of libvirt's hack/workaround being accidentally broken by future QEMU changes Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Apr 27, 2017 at 11:43 PM, Martin Kletzander <mkletzan@redhat.com> wrote:
I'm trying to understand the situation. So you have a guest that handles crashes itself (like kdump, let's say), but on_crash options are not enough for you:
- preserve is bad because the guest is not available until someone restarts it
- restart is bad because it doesn't keep the dump anywhere?
- coredump-restart is bad because it doesn't keep the internal dump?
I have no usage for this, currently, so I'm not the right one to discuss this, but I feel like you want the guest-handled crash to be uploaded or saved somewhere and then have libvirt just restart it. Or configure the guest not to handle crashes and set on_crash to coredump-restart.
If none of those is working for you and you really need a special case, it is doable with a short script atop of libvirt.
Windows all the way back to XP has handled crashes itself by writing a dump file to disk. This is not a complete coredump but a special format that can be read by a variety of tools to extract useful information for diagnosing the crash. A libvirt-generated coredump would be much less useful for experienced Windows admins. After writing the dump file, Windows can automatically reboot itself. This has been the default behavior since at least Windows Server 2003 and Vista, and experienced Windows admins rely on it. For Windows guests, all I want libvirt to do when it receives a panic notification from QEMU is resume the guest, so it can write the dump file and reboot itself automatically. None of the on_crash actions allow this. And as a failsafe for guests not configured to automatically reboot (Windows or otherwise), it would be nice if libvirt had an on_crash action that resumes the guest immediately, and reboots the guest after some configurable timeout if the guest doesn't reboot itself first. I'd settle for implementing this more complicated policy in a script, but libvirt would at least need to remember the time of the crash and expose that through its API. --Ed

On Fri, Apr 28, 2017 at 11:23:11AM -0700, Ed Swierk wrote:
On Thu, Apr 27, 2017 at 11:43 PM, Martin Kletzander <mkletzan@redhat.com> wrote:
I'm trying to understand the situation. So you have a guest that handles crashes itself (like kdump, let's say), but on_crash options are not enough for you:
- preserve is bad because the guest is not available until someone restarts it
- restart is bad because it doesn't keep the dump anywhere?
- coredump-restart is bad because it doesn't keep the internal dump?
I have no usage for this, currently, so I'm not the right one to discuss this, but I feel like you want the guest-handled crash to be uploaded or saved somewhere and then have libvirt just restart it. Or configure the guest not to handle crashes and set on_crash to coredump-restart.
If none of those is working for you and you really need a special case, it is doable with a short script atop of libvirt.
Windows all the way back to XP has handled crashes itself by writing a dump file to disk. This is not a complete coredump but a special format that can be read by a variety of tools to extract useful information for diagnosing the crash. A libvirt-generated coredump would be much less useful for experienced Windows admins.
After writing the dump file, Windows can automatically reboot itself. This has been the default behavior since at least Windows Server 2003 and Vista, and experienced Windows admins rely on it.
For Windows guests, all I want libvirt to do when it receives a panic notification from QEMU is resume the guest, so it can write the dump file and reboot itself automatically. None of the on_crash actions allow this.
Thanks for the explanation, now I understand what the problem is.
And as a failsafe for guests not configured to automatically reboot (Windows or otherwise), it would be nice if libvirt had an on_crash action that resumes the guest immediately, and reboots the guest after some configurable timeout if the guest doesn't reboot itself first. I'd settle for implementing this more complicated policy in a script, but libvirt would at least need to remember the time of the crash and expose that through its API.
Yeah, we don't support additional information for states (e.g. the time the state was last changed). It is visible from the logs, but that's not something someone should parse to figure this out. I agree that we could support more options for on_panic. I'm not sure how QEMU handles resumes in various cases, but it should be fine anyway. Feel free to create a request in bugzilla [1] so that we don't forget about it accidentally. In the meantime, the script should be pretty easy to cook up. Just listen for events, when you get PANICKED note the time, resume the guest. For the reboot after that (in case it does not reboots itself), I would expect you to be able to use watchdog, but if you can't, then what you can do is wait for a 'reboot' event (having new enough QEMU this is an arbitrary event passed through libvirt) and if you don't get it in the amount of time you expect, then just reset the VM. Martin

On 04/28/2017 02:34 AM, Ed Swierk wrote:
The panic device is currently documented as a way for "libvirt to receive panic notification from a QEMU guest".
This is true, but not the whole story. When a guest triggers the panic device, QEMU pauses the guest, and libvirt takes the action specified by on_crash. This can interfere with the guest's own crash handling actions (e.g. writing a dump file and rebooting itself) if the guest triggers the panic device first (as Windows does).
None of this is an obvious side effect of a notification mechanism, so the panic device documentation should mention it. (I'll send a documentation patch shortly.)
Nor is this a desirable side effect, for guests that are configured to deal with crashes themselves. Sure, you can avoid using the panic device with such guests, but then virsh list or another application using the libvirt API to monitor domain state won't notice guest crashes. And if you still want libvirt to take action on guests that don't do it themselves, then you have to be careful to include the panic device only for those domains.
Ideally libvirt would offer (1) a state indicating "this guest crashed and needs help" independently of triggering an action, and (2) a way to trigger an action only when needed to recover from the crash, excluding guests that deal with their own crashes.
Sadly pvpanic and the HyperV crash MSR convey only that the guest crashed, not whether the guest is configured to take some action on its own. So there's no way to know precisely that a crashed (and not paused) guest is in need of assistance.
But a state indicating "this guest crashed N minutes ago and hasn't rebooted itself" would be a useful approximation. And triggering an action N minutes after a guest crash if it hasn't rebooted itself in the meantime would make it easy to cap the downtime of crashed domains. Both could be implemented without changing either QEMU or panic device semantics.
Does this seem useful to anyone else?
On s390 we only have a "pseudo" panic device. Our guests load a disabled wait PSW to indicate a crash. This is wired up in QEMU as panic state and thus notifies libvirt that the guest is in crashed state. If the guest does kdump or similar it will never load a disabled wait PSW. So from my perspective this works exactly as I like to it to behave, but I find it interesting that others seem to trigger the panic device even if the guest handles that.
participants (4)
-
Christian Borntraeger
-
Daniel P. Berrange
-
Ed Swierk
-
Martin Kletzander