[libvirt] pvpanic plans?

The thread from yesterday has died off (perhaps also because of my inappropriate answer to Michael, for which I apologize to him and everyone). I took some time to discuss the libvirt requirements further with Daniel Berrange and Eric Blake on IRC. If anyone is interested, I can give logs. This is a suggestion for how to proceed in both QEMU and libvirt. == Builtin pvpanic == QEMU will remove pvpanic from pc-1.5 in 1.6.1 and 1.5.4. This does not break migration. == Support in libvirt for current functionality == libvirt will add a <panic-notifier/> element, and possibly a capability for it accessible via "virsh capabilities". There are two possibilities: 1) On QEMU 1.5.4/1.6.1 and newer (and on QEMU 1.6.0 with a machine type other than pc-1.5), <on_crash> will only work if the element is there. On QEMU 1.5.0->1.5.3, and on QEMU 1.6.0 with the pc-1.5 machine type, <on_crash> will be obeyed always, and may override e.g. reboot-on-panic if a guest driver exist. 2) On all versions, <on_crash> will only work if the element is there. In turn, there are two ways to implement (2): 2a) libvirt will always add -global pvpanic.iobase=0 to neutralize the builtin pvpanic device if present. <panic-notifier/> will create the device with -device pvpanic,iobase=0x505 Advantage: no changes to QEMU Disadvantage 1: writes to port 0 with QEMU 1.{5.0,5.1,5.2,5.3,6.0} and pc-1.5 machine type will write to a pvpanic device instead of the DMA controller. Probably harmless, and limited to some QEMU versions. Disadvantage 2: libvirt has knowledge of the pvpanic port number 2b) QEMU will provide a way for libvirt to detect that no machine type has the builtin pvpanic. If some machine type may have the builtin pvpanic, and <panic-notifier/> is absent, libvirt will add "-global pvpanic.iobase=0" to neutralize it. Otherwise, libvirt will create the device normally. A possible way for libvirt to detect "good" machine types is a dummy property. This is a bit ugly in that the property would not affect the behavior of the device. The property would remain in the long term. Another possibility is for QEMU to rename the device, e.g. to isa-pvpanic. This is also somewhat gross, but not visible in the long term when the "pvpanic" name will be lost in history. Advantage 1: libvirt has no knowledge of the pvpanic port number Disadvantage 1: same as above Disadvantage 2: need a somewhat gross change in QEMU This method also provides an (also somewhat gross on the QEMU side) way to detect other changes in the pvpanic semantics. One example mentioned below, is making the panicked state temporary. == Possible improvements to pvpanic == The current implementation of pvpanic supports three modes: reset system on panic, destroy domain on panic, preserve domain with no possibility to resume it. (Optionally a domain can be dumped too). Long term, the choice to include pvpanic should not be on the guest admin's shoulders, but rather in libosinfo. Thus, it would be nice to have a fourth mode where the panic is logged but the guest otherwise keeps running. This mode would let libosinfo add pvpanic by default without affecting the guest's behavior on panic. With this change, <on_crash>ignore</on_crash> will behave as follows for the three possibilities above: (1) With QEMU 1.5.0 to 1.6.1, <on_crash> will _not_ obey the setting, never (even if no <panic-notifier/> is specified). libvirt will have to pick a fallback action. advantage of destroy as fallback: it is the default (but note that restart is the default for virt-install) advantage of preserve as fallback: lets the admin examine the panic advantage of restart as fallback: maximum availability of the VM, it is the default for virt-install (2a) With QEMU 1.5.0 to 1.6.1, <on_crash> will _not_ obey the setting if <panic-notifier/> is specified. libvirt has _no way_ to learn about this, so the capability would always be present with these QEMU versions and libosinfo would always add <panic-notifier/> with these versions. Given the libosinfo scenario being considered here, this is not very different from (1). (2b) With QEMU 1.5.0 to 1.6.1, the <panic-notifier/> element will not be available and not exposed in libvirt capabilities. Thus with this version libosinfo would omit <panic-notifier/> from the XML. Guest policy will always be followed correctly. The problem in both (1) and (2a) can be summarized as follows. First, libvirt will have to implement and document a fallback action for buggy QEMU. Second, even though the problems would be limited to some version of QEMU, they would be relatively hard to debug for a casual user, could start happening randomly by updating any one of QEMU, libvirt, libosinfo or the guest kernel, and there is no fallback action for libvirt that is always correct. Thus, considering future libosinfo support for pvpanic, (2b) is preferrable in my opinion. Now, making pvpanic reversible requires a change in QEMU (patch already posted). Andreas proposed using a pvpanic property to determine whether the panicked state is temporary or definitive. libvirt could piggyback on such a property to detect the "goodness" of machine types (as mentioned regarding solution 2b above). However: First, this would require a more intrusive patch, less appealing for 1.5 and 1.6 stable branches. Second, there is no reason why libvirt would want to make the panicked state definitive. To achieve the same effect, libvirt can just not issue the "continue" monitor command when the guest is panicked. Thus the new property would be useless except to communicate pvpanic behavior---and renaming the device still seems preferrable to me. Thanks for reading up to here! Paolo

Paolo Bonzini <pbonzini@redhat.com> writes:
The thread from yesterday has died off (perhaps also because of my inappropriate answer to Michael, for which I apologize to him and everyone). I took some time to discuss the libvirt requirements further with Daniel Berrange and Eric Blake on IRC. If anyone is interested, I can give logs. This is a suggestion for how to proceed in both QEMU and libvirt.
== Builtin pvpanic ==
QEMU will remove pvpanic from pc-1.5 in 1.6.1 and 1.5.4. This does not break migration.
pvpanic has been a failure. It's a poorly designed device with even worse semantics. I applied it and I'll take the fault for merging it in the first place. We should simply scrap it and start over. It has so few users at this point that this is still a realistic option. Using something based on ISA that requires specific ACPI entries was a mistake. We should just introduce a simple watchdog device based on virtio and call it a day. Then it's cross platform, solves the guest enumeration problem, and libvirt can detect the presence of the new device. None of the plans outlined below give us a proper solution. I think removing is our best option at this point. Regards, Anthony Liguori
== Support in libvirt for current functionality ==
libvirt will add a <panic-notifier/> element, and possibly a capability for it accessible via "virsh capabilities". There are two possibilities:
1) On QEMU 1.5.4/1.6.1 and newer (and on QEMU 1.6.0 with a machine type other than pc-1.5), <on_crash> will only work if the element is there. On QEMU 1.5.0->1.5.3, and on QEMU 1.6.0 with the pc-1.5 machine type, <on_crash> will be obeyed always, and may override e.g. reboot-on-panic if a guest driver exist.
2) On all versions, <on_crash> will only work if the element is there.
In turn, there are two ways to implement (2):
2a) libvirt will always add -global pvpanic.iobase=0 to neutralize the builtin pvpanic device if present. <panic-notifier/> will create the device with -device pvpanic,iobase=0x505
Advantage: no changes to QEMU
Disadvantage 1: writes to port 0 with QEMU 1.{5.0,5.1,5.2,5.3,6.0} and pc-1.5 machine type will write to a pvpanic device instead of the DMA controller. Probably harmless, and limited to some QEMU versions.
Disadvantage 2: libvirt has knowledge of the pvpanic port number
2b) QEMU will provide a way for libvirt to detect that no machine type has the builtin pvpanic. If some machine type may have the builtin pvpanic, and <panic-notifier/> is absent, libvirt will add "-global pvpanic.iobase=0" to neutralize it. Otherwise, libvirt will create the device normally.
A possible way for libvirt to detect "good" machine types is a dummy property. This is a bit ugly in that the property would not affect the behavior of the device. The property would remain in the long term.
Another possibility is for QEMU to rename the device, e.g. to isa-pvpanic. This is also somewhat gross, but not visible in the long term when the "pvpanic" name will be lost in history.
Advantage 1: libvirt has no knowledge of the pvpanic port number
Disadvantage 1: same as above
Disadvantage 2: need a somewhat gross change in QEMU
This method also provides an (also somewhat gross on the QEMU side) way to detect other changes in the pvpanic semantics. One example mentioned below, is making the panicked state temporary.
== Possible improvements to pvpanic ==
The current implementation of pvpanic supports three modes: reset system on panic, destroy domain on panic, preserve domain with no possibility to resume it. (Optionally a domain can be dumped too).
Long term, the choice to include pvpanic should not be on the guest admin's shoulders, but rather in libosinfo. Thus, it would be nice to have a fourth mode where the panic is logged but the guest otherwise keeps running. This mode would let libosinfo add pvpanic by default without affecting the guest's behavior on panic.
With this change, <on_crash>ignore</on_crash> will behave as follows for the three possibilities above:
(1) With QEMU 1.5.0 to 1.6.1, <on_crash> will _not_ obey the setting, never (even if no <panic-notifier/> is specified).
libvirt will have to pick a fallback action.
advantage of destroy as fallback: it is the default (but note that restart is the default for virt-install)
advantage of preserve as fallback: lets the admin examine the panic
advantage of restart as fallback: maximum availability of the VM, it is the default for virt-install
(2a) With QEMU 1.5.0 to 1.6.1, <on_crash> will _not_ obey the setting if <panic-notifier/> is specified. libvirt has _no way_ to learn about this, so the capability would always be present with these QEMU versions and libosinfo would always add <panic-notifier/> with these versions. Given the libosinfo scenario being considered here, this is not very different from (1).
(2b) With QEMU 1.5.0 to 1.6.1, the <panic-notifier/> element will not be available and not exposed in libvirt capabilities. Thus with this version libosinfo would omit <panic-notifier/> from the XML. Guest policy will always be followed correctly.
The problem in both (1) and (2a) can be summarized as follows. First, libvirt will have to implement and document a fallback action for buggy QEMU. Second, even though the problems would be limited to some version of QEMU, they would be relatively hard to debug for a casual user, could start happening randomly by updating any one of QEMU, libvirt, libosinfo or the guest kernel, and there is no fallback action for libvirt that is always correct.
Thus, considering future libosinfo support for pvpanic, (2b) is preferrable in my opinion.
Now, making pvpanic reversible requires a change in QEMU (patch already posted). Andreas proposed using a pvpanic property to determine whether the panicked state is temporary or definitive. libvirt could piggyback on such a property to detect the "goodness" of machine types (as mentioned regarding solution 2b above). However:
First, this would require a more intrusive patch, less appealing for 1.5 and 1.6 stable branches. Second, there is no reason why libvirt would want to make the panicked state definitive. To achieve the same effect, libvirt can just not issue the "continue" monitor command when the guest is panicked. Thus the new property would be useless except to communicate pvpanic behavior---and renaming the device still seems preferrable to me.
Thanks for reading up to here!
Paolo

On 08/22/13 18:44, Anthony Liguori wrote:
pvpanic has been a failure. It's a poorly designed device with even worse semantics.
I disagree somewhat. Requiring a separate ioport is not ideal, I admit. Configuration over ACPI is good OTOH (it seems to put standards to good use anyway). Noone realized pvpanic had poor technical design until the Windows "new device" wizard popped up -- is that correct? Most of us are probably not habitual Windows users, which is probably why we haven't thought of it earlier. Maybe we shouldn't promise "there won't be guest-visible changes in ACPI contents". If we do promise, maybe we should then make the SeaBIOS binary that we're loading dependent on -M too too. After all, had we managed to completely hide the \_SB.PCI0.ISA.PEVT device programmatically, as opposed to only disabling it, we might have never realized pvpanic had poor design. Which (almost) means it wouldn't have had one. If we selected a SeaBIOS binary based on -M, then we could hide this stuff from Windows.
I applied it and I'll take the fault for merging it in the first place.
We should simply scrap it and start over.
That will kinda Eff some downstreams in the A...
It has so few users at this point that this is still a realistic option. Using something based on ISA that requires specific ACPI entries was a mistake.
We should just introduce a simple watchdog device based on virtio and call it a day. Then it's cross platform, solves the guest enumeration problem, and libvirt can detect the presence of the new device.
If the guest doesn't initialize the proposed virtio-panic device, then it will lie dormant too, just like the current pvpanic device. That's good. However a new (standalone) virtio device will take up yet another PCI function (a full device if you want it to be hotpluggable). PCI functions are scarcer than ioports. It will need documentation in the virtio-spec as well. We'd need an arbitrarily heavily multiplexed paravirt channel between guest and qemu. Maybe a dedicated virtio-serial port that's not exposed to other host processes; one that qemu would "consume" itself. If you want to be able to panic in boot firmware, writing to an ioport is easier than adding a new virtio driver (virtio-serial, or a completely new device).
None of the plans outlined below give us a proper solution. I think removing is our best option at this point.
I'm just trolling ^W playing the devil's advocate here, giving you more opportunity to argue your point :) Thanks, Laszlo

Laszlo Ersek <lersek@redhat.com> writes:
On 08/22/13 18:44, Anthony Liguori wrote:
pvpanic has been a failure. It's a poorly designed device with even worse semantics.
I disagree somewhat.
Requiring a separate ioport is not ideal, I admit. Configuration over ACPI is good OTOH (it seems to put standards to good use anyway).
Noone realized pvpanic had poor technical design until the Windows "new device" wizard popped up -- is that correct? Most of us are probably not habitual Windows users, which is probably why we haven't thought of it earlier.
Generating ACPI tables dynamically is painful and worse yet, it's 100% ACPI specific. Had we used virtio from the start, we would have had a cross-architecture mechanism instead of a one-off x86-ism. Yes, hind sight is 20/20 but that shouldn't stop us from doing things right when presented the opportunity.
Maybe we shouldn't promise "there won't be guest-visible changes in ACPI contents". If we do promise, maybe we should then make the SeaBIOS binary that we're loading dependent on -M too too.
After all, had we managed to completely hide the \_SB.PCI0.ISA.PEVT device programmatically, as opposed to only disabling it, we might have never realized pvpanic had poor design. Which (almost) means it wouldn't have had one.
If we selected a SeaBIOS binary based on -M, then we could hide this stuff from Windows.
Yes, at some point I'm sure we'll hit the need for maintaining multiple copies of SeaBIOS but that's going to make testing all that much harder. The longer we can avoid it the better IMHO.
I applied it and I'll take the fault for merging it in the first place.
We should simply scrap it and start over.
That will kinda Eff some downstreams in the A...
If it's too late then we're stuck with it, but perhaps some of the downstreams can skip up about what level of support they need for the existing device in a bit more detail... AFAICT, we've got something that's fundamentally broken right now so downstreams are already in a bind if they're planning on supporting this device.
It has so few users at this point that this is still a realistic option. Using something based on ISA that requires specific ACPI entries was a mistake.
We should just introduce a simple watchdog device based on virtio and call it a day. Then it's cross platform, solves the guest enumeration problem, and libvirt can detect the presence of the new device.
If the guest doesn't initialize the proposed virtio-panic device, then it will lie dormant too, just like the current pvpanic device. That's good.
Ack.
However a new (standalone) virtio device will take up yet another PCI function (a full device if you want it to be hotpluggable). PCI functions are scarcer than ioports.
We can always use bridges to expand the number of slots available. If we truly run into a situation where slots become too scarce, then we can look at introducing a PCI-to-N-virtio-devices bridge.
It will need documentation in the virtio-spec as well.
Ack.
We'd need an arbitrarily heavily multiplexed paravirt channel between guest and qemu. Maybe a dedicated virtio-serial port that's not exposed to other host processes; one that qemu would "consume" itself.
I don't think using virtio-serial would be a good approach. If we make the panic flag a config space variable, it makes it pretty easy for firmware to use since it's still just an ioport write.
If you want to be able to panic in boot firmware, writing to an ioport is easier than adding a new virtio driver (virtio-serial, or a completely new device).
See above.
None of the plans outlined below give us a proper solution. I think removing is our best option at this point.
I'm just trolling ^W playing the devil's advocate here, giving you more opportunity to argue your point :)
It's really a tremendously simple virtio device to start with. No queues, just a configuration space with a single flag. Down the road, I can imagine lots of useful extensions like the ability to send a stack trace to the host as part of the panic. That would be mighty handy. That's easy to add with virtio but pretty much impossible with the current device. Plus adding watchdog functionality would be a logical extension too. I believe that the watchdogs we emulate today are not supported by a majority of guests. A virtio-ras device with Windows and Linux drivers would give us a universally supported watchdog device. Regards, Anthony Liguori
Thanks, Laszlo

On Thu, Aug 22, 2013 at 01:25:32PM -0500, Anthony Liguori wrote:
I believe that the watchdogs we emulate today are not supported by a majority of guests.
BTW this is not true. The two watchdog devices are supported by all Linux guests. Windows guests do not support them, but Windows lacks[1] any sort of watchdog framework so lack of device support is the least of your problems. There would be nothing for the device to plug into (unlike the /dev/watchdog API on Linux), nor is there any daemon to support it (unlike the 15 year old watchdog daemon on Linux). Rich. [1] Yes, this exists: http://msdn.microsoft.com/en-us/library/ms856963.aspx but it requires a special version of Windows. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top

Il 22/08/2013 19:53, Laszlo Ersek ha scritto:
We should just introduce a simple watchdog device based on virtio and call it a day. Then it's cross platform, solves the guest enumeration problem, and libvirt can detect the presence of the new device. If the guest doesn't initialize the proposed virtio-panic device, then it will lie dormant too, just like the current pvpanic device. That's good.
However a new (standalone) virtio device will take up yet another PCI function (a full device if you want it to be hotpluggable). PCI functions are scarcer than ioports.
Not just that. Panic notifiers are called in a substantially unknown environment, with locks taken or interrupts already set up. This is why we went for a simple ISA device. Configuration via ACPI follows naturally from there, and anyway any other standard of the day would have had the same problem with Windows. At some point we had ACPI methods instead of a simple ioport write, but we had to remove that because the ACPI subsystem might have had its lock taken. Also, a virtio watchdog device makes little sense, IMHO. PV makes sense if emulation has insufficient performance, excessive CPU usage, or excessive complexity. We already have both an ISA and a PCI watchdog, and they serve their purpose wonderfully. Paolo

Paolo Bonzini <pbonzini@redhat.com> writes:
Il 22/08/2013 19:53, Laszlo Ersek ha scritto:
We should just introduce a simple watchdog device based on virtio and call it a day. Then it's cross platform, solves the guest enumeration problem, and libvirt can detect the presence of the new device. If the guest doesn't initialize the proposed virtio-panic device, then it will lie dormant too, just like the current pvpanic device. That's good.
However a new (standalone) virtio device will take up yet another PCI function (a full device if you want it to be hotpluggable). PCI functions are scarcer than ioports.
Not just that. Panic notifiers are called in a substantially unknown environment, with locks taken or interrupts already set up.
If you make the panic notify a config space write, then on virtio-pci, it's an outb to a fixed offset within a io address that after boot is static. So the address could be stored in a global and accessed without a lock.
This is why we went for a simple ISA device.
"Simple ISA device" doesn't exist outside of x86 and as we are learning, it's not all that simple.
Configuration via ACPI follows naturally from there, and anyway any other standard of the day would have had the same problem with Windows. At some point we had ACPI methods instead of a simple ioport write, but we had to remove that because the ACPI subsystem might have had its lock taken.
The difference is that ACPI or platform devices in general are unexpected to be added. By definition it means that the motherboard has most likely been changed. OTOH, a new PCI device is expected and most OSes will deal gracefully with it.
Also, a virtio watchdog device makes little sense, IMHO. PV makes sense if emulation has insufficient performance, excessive CPU usage, or excessive complexity. We already have both an ISA and a PCI watchdog, and they serve their purpose wonderfully.
Neither of which actually work with modern versions of Windows FWIW. Plus emulated watchdogs do not take into account steal time or overcommit in general. I've seen multiple cases where a naive watchdog has a problem in the field when the system is under heavy load. A PV watchdog actually makes sense because it can be defined based on guest run time instead of wall clock time. Regards, Anthony Liguori
Paolo

On 08/22/13 22:09, Anthony Liguori wrote:
The difference is that ACPI or platform devices in general are unexpected to be added. By definition it means that the motherboard has most likely been changed.
You could encounter a new ACPI artifact after simply re-flashing your MB with an updated BIOS, without opening the chassis. "If windows can't deal with that, their loss!" :) Laszlo /hides

On Thu, Aug 22, 2013 at 3:36 PM, Laszlo Ersek <lersek@redhat.com> wrote:
On 08/22/13 22:09, Anthony Liguori wrote:
The difference is that ACPI or platform devices in general are unexpected to be added. By definition it means that the motherboard has most likely been changed.
You could encounter a new ACPI artifact after simply re-flashing your MB with an updated BIOS, without opening the chassis. "If windows can't deal with that, their loss!" :)
I'm pretty sure "does Windows boot up okay" is on every major vendor's firmware test plan for shipping new updates... Regards, Anthony Liguori
Laszlo /hides

Il 22/08/2013 22:39, Anthony Liguori ha scritto:
On Thu, Aug 22, 2013 at 3:36 PM, Laszlo Ersek <lersek@redhat.com> wrote:
On 08/22/13 22:09, Anthony Liguori wrote:
The difference is that ACPI or platform devices in general are unexpected to be added. By definition it means that the motherboard has most likely been changed.
You could encounter a new ACPI artifact after simply re-flashing your MB with an updated BIOS, without opening the chassis. "If windows can't deal with that, their loss!" :)
I'm pretty sure "does Windows boot up okay" is on every major vendor's firmware test plan for shipping new updates...
For a firmware vendor it is perfectly okay to ship and require new drivers for functionality introduced by a firmware update... Paolo

On 22 August 2013 21:09, Anthony Liguori <anthony@codemonkey.ws> wrote:
Paolo Bonzini <pbonzini@redhat.com> writes:
Not just that. Panic notifiers are called in a substantially unknown environment, with locks taken or interrupts already set up.
If you make the panic notify a config space write, then on virtio-pci, it's an outb to a fixed offset within a io address that after boot is static.
So the address could be stored in a global and accessed without a lock.
Fine for virtio-mmio too, obviously. I have a vague recollection that config space writes on virtio-s390 are weird though. (would also be an issue if we wanted to implement the virtio-console "emergency write" functionality.) -- PMM

On Thu, Aug 22, 2013 at 03:09:06PM -0500, Anthony Liguori wrote:
Paolo Bonzini <pbonzini@redhat.com> writes:
Also, a virtio watchdog device makes little sense, IMHO. PV makes sense if emulation has insufficient performance, excessive CPU usage, or excessive complexity. We already have both an ISA and a PCI watchdog, and they serve their purpose wonderfully.
Neither of which actually work with modern versions of Windows FWIW.
Correct, although someone could write a driver!
Plus emulated watchdogs do not take into account steal time or overcommit in general. I've seen multiple cases where a naive watchdog has a problem in the field when the system is under heavy load.
The watchdog devices in qemu run on guest time. However the watchdog *daemon* inside the guest probably does behave badly as you describe. Changing the device model isn't going to help this, but it would definitely make sense to fix the daemon (although I don't know how -- is steal time exposed to guests?) I don't necessarily think a virtio-watchdog is a bad idea. For one thing it'd mean we would have a watchdog device that works on ARM. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top

On 08/27/2013 11:06 AM, Richard W.M. Jones wrote:
On Thu, Aug 22, 2013 at 03:09:06PM -0500, Anthony Liguori wrote:
Paolo Bonzini <pbonzini@redhat.com> writes:
Also, a virtio watchdog device makes little sense, IMHO. PV makes sense if emulation has insufficient performance, excessive CPU usage, or excessive complexity. We already have both an ISA and a PCI watchdog, and they serve their purpose wonderfully. Neither of which actually work with modern versions of Windows FWIW. Correct, although someone could write a driver!
Plus emulated watchdogs do not take into account steal time or overcommit in general. I've seen multiple cases where a naive watchdog has a problem in the field when the system is under heavy load. The watchdog devices in qemu run on guest time. However the watchdog *daemon* inside the guest probably does behave badly as you describe. Changing the device model isn't going to help this, but it would definitely make sense to fix the daemon (although I don't know how -- is steal time exposed to guests?)
I don't necessarily think a virtio-watchdog is a bad idea. For one thing it'd mean we would have a watchdog device that works on ARM.
Rich.
I believe that a watchdog is not the way to go. You need host-side decision making. Say that the guest did not receive CPU/Disk/network resources for a lengthy period of time, but the host knows that this is due to host resources availability. In such cases, you certainly do not want to reboot all the guests, especially since rebooting 50 Windows VMs could be a nightmare. BTW, Windows guest disable some of their watchdogs when they detect the presence of Hyper-V, we use it to overcome BSODs! So the right solution is to send a heart-beat to a management application (using qemu-ga or whatever), and let it decide how to handle it. Ronen.

On Tue, Aug 27, 2013 at 04:08:12PM +0300, Ronen Hod wrote:
So the right solution is to send a heart-beat to a management application (using qemu-ga or whatever), and let it decide how to handle it.
Agreed. The qemu watchdog lets you do this already. You can (using the qemu monitor, or libvirt) capture watchdog events and put them into your management application. Watchdog firing does *not* necessarily mean a guest reboot. [Note what I say applies to the qemu watchdog device. The Linux watchdog daemon may independently initiate a guest reboot, but you can configure it to perform other actions instead.] Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

On Tue, Aug 27, 2013 at 8:20 AM, Richard W.M. Jones <rjones@redhat.com> wrote:
On Tue, Aug 27, 2013 at 04:08:12PM +0300, Ronen Hod wrote:
So the right solution is to send a heart-beat to a management application (using qemu-ga or whatever), and let it decide how to handle it.
This is host-centric solution and assumes that a management tool is making all of the decisions. This doesn't work in an IaaS environment where these sort of policy decisions need to be driven from the guest. Furthermore, you really want the watchdog daemon to run with real time priority which implies a heightened privilege level. This rules out using qemu-ga for that purpose.
Agreed. The qemu watchdog lets you do this already. You can (using the qemu monitor, or libvirt) capture watchdog events and put them into your management application. Watchdog firing does *not* necessarily mean a guest reboot.
Ack, but the current watchdog does not work for Windows guests and is not aware of guest time. That's why I think having a virtio-ilo makes sense. This is not a solved problem today. Regards, Anthony Liguori
[Note what I say applies to the qemu watchdog device. The Linux watchdog daemon may independently initiate a guest reboot, but you can configure it to perform other actions instead.]
Rich.
-- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

On Tue, Aug 27, 2013 at 08:26:53AM -0500, Anthony Liguori wrote:
That's why I think having a virtio-ilo makes sense. This is not a solved problem today.
What's the scope of virtio-ilo? If it's anything like a real ILO it's going to do a lot of not-very-related things, such as: - pvpanic-type function - watchdog-type function - remote console / serial ports - remote CD-ROM - remote power switch qemu already does nearly all of this ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

On Thu, Aug 22, 2013 at 09:16:57PM +0200, Paolo Bonzini wrote:
Il 22/08/2013 19:53, Laszlo Ersek ha scritto:
We should just introduce a simple watchdog device based on virtio and call it a day. Then it's cross platform, solves the guest enumeration problem, and libvirt can detect the presence of the new device. If the guest doesn't initialize the proposed virtio-panic device, then it will lie dormant too, just like the current pvpanic device. That's good.
However a new (standalone) virtio device will take up yet another PCI function (a full device if you want it to be hotpluggable). PCI functions are scarcer than ioports.
Not just that. Panic notifiers are called in a substantially unknown environment, with locks taken or interrupts already set up.
This is why we went for a simple ISA device. Configuration via ACPI follows naturally from there, and anyway any other standard of the day would have had the same problem with Windows. At some point we had ACPI methods instead of a simple ioport write, but we had to remove that because the ACPI subsystem might have had its lock taken.
Also, a virtio watchdog device makes little sense, IMHO. PV makes sense if emulation has insufficient performance, excessive CPU usage, or excessive complexity. We already have both an ISA and a PCI watchdog, and they serve their purpose wonderfully.
I also don't think that panic notifiers & watchdogs are really serving the same purpose. The panic notifier is an alert to a specific known kernel crash. A watchdog is merely a timeout, which is inferred to mean /something/ went wrong. Both have their uses IMHO & we should not conflate the two. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Aug 27, 2013 at 8:13 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Thu, Aug 22, 2013 at 09:16:57PM +0200, Paolo Bonzini wrote:
Il 22/08/2013 19:53, Laszlo Ersek ha scritto:
We should just introduce a simple watchdog device based on virtio and call it a day. Then it's cross platform, solves the guest enumeration problem, and libvirt can detect the presence of the new device. If the guest doesn't initialize the proposed virtio-panic device, then it will lie dormant too, just like the current pvpanic device. That's good.
However a new (standalone) virtio device will take up yet another PCI function (a full device if you want it to be hotpluggable). PCI functions are scarcer than ioports.
Not just that. Panic notifiers are called in a substantially unknown environment, with locks taken or interrupts already set up.
This is why we went for a simple ISA device. Configuration via ACPI follows naturally from there, and anyway any other standard of the day would have had the same problem with Windows. At some point we had ACPI methods instead of a simple ioport write, but we had to remove that because the ACPI subsystem might have had its lock taken.
Also, a virtio watchdog device makes little sense, IMHO. PV makes sense if emulation has insufficient performance, excessive CPU usage, or excessive complexity. We already have both an ISA and a PCI watchdog, and they serve their purpose wonderfully.
I also don't think that panic notifiers & watchdogs are really serving the same purpose. The panic notifier is an alert to a specific known kernel crash. A watchdog is merely a timeout, which is inferred to mean /something/ went wrong. Both have their uses IMHO & we should not conflate the two.
Even if you ignore the watchdog aspect of this, having a portable panic notifier and the ability to enhance it to include more information (like the backtrace, etc.) is pretty darn useful. Regards, Anthony Liguori
Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Aug 27, 2013 at 02:13:34PM +0100, Daniel P. Berrange wrote:
On Thu, Aug 22, 2013 at 09:16:57PM +0200, Paolo Bonzini wrote:
Il 22/08/2013 19:53, Laszlo Ersek ha scritto:
We should just introduce a simple watchdog device based on virtio and call it a day. Then it's cross platform, solves the guest enumeration problem, and libvirt can detect the presence of the new device. If the guest doesn't initialize the proposed virtio-panic device, then it will lie dormant too, just like the current pvpanic device. That's good.
However a new (standalone) virtio device will take up yet another PCI function (a full device if you want it to be hotpluggable). PCI functions are scarcer than ioports.
Not just that. Panic notifiers are called in a substantially unknown environment, with locks taken or interrupts already set up.
This is why we went for a simple ISA device. Configuration via ACPI follows naturally from there, and anyway any other standard of the day would have had the same problem with Windows. At some point we had ACPI methods instead of a simple ioport write, but we had to remove that because the ACPI subsystem might have had its lock taken.
Also, a virtio watchdog device makes little sense, IMHO. PV makes sense if emulation has insufficient performance, excessive CPU usage, or excessive complexity. We already have both an ISA and a PCI watchdog, and they serve their purpose wonderfully.
I also don't think that panic notifiers & watchdogs are really serving the same purpose. The panic notifier is an alert to a specific known kernel crash. A watchdog is merely a timeout, which is inferred to mean /something/ went wrong. Both have their uses IMHO & we should not conflate the two.
Exactly this. They are two different things. Of course ILOs combine both (and more) in one mega-device :-) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/

On 08/22/13 18:10, Paolo Bonzini wrote:
The thread from yesterday has died off (perhaps also because of my inappropriate answer to Michael, for which I apologize to him and everyone). I took some time to discuss the libvirt requirements further with Daniel Berrange and Eric Blake on IRC. If anyone is interested, I can give logs. This is a suggestion for how to proceed in both QEMU and libvirt.
The analysis is pretty overwhelming :) I have read Anthony's response and I'm not trying to argue -- I'm just spending a few thoughts on this and I'm willing to let them go to waste. In general I think we should minimize the quirks the user (who edits the libvirt XML) has to know about. Interactions between some XML elements, without explicit inter-references (formulated as attributes, like controller/index) are Bad (TM). So,
== Builtin pvpanic ==
QEMU will remove pvpanic from pc-1.5 in 1.6.1 and 1.5.4. This does not break migration.
== Support in libvirt for current functionality ==
libvirt will add a <panic-notifier/> element, and possibly a capability for it accessible via "virsh capabilities". There are two possibilities:
1) On QEMU 1.5.4/1.6.1 and newer (and on QEMU 1.6.0 with a machine type other than pc-1.5), <on_crash> will only work if the element is there. On QEMU 1.5.0->1.5.3, and on QEMU 1.6.0 with the pc-1.5 machine type, <on_crash> will be obeyed always, and may override e.g. reboot-on-panic if a guest driver exist.
I don't like this because there's some interplay between on_crash and panic_notifier, which even depends on the qemu version.
2) On all versions, <on_crash> will only work if the element is there.
I like this, because, if on_crash doesn't work without panic_notifier *at all*, then we can just drop panic_notifier, and make on_crash mean (on_crash && panic_notifier) in the original sense. IOW, drop "panic_notifier", and make "on_crash" work *always*.
In turn, there are two ways to implement (2):
2a) libvirt will always add -global pvpanic.iobase=0 to neutralize the builtin pvpanic device if present. <panic-notifier/> will create the device with -device pvpanic,iobase=0x505
Advantage: no changes to QEMU
Disadvantage 1: writes to port 0 with QEMU 1.{5.0,5.1,5.2,5.3,6.0} and pc-1.5 machine type will write to a pvpanic device instead of the DMA controller. Probably harmless, and limited to some QEMU versions.
Disadvantage 2: libvirt has knowledge of the pvpanic port number
Updating this paragraph with my above suggestion: - (s/pvpanic.iobase/pvpanic.ioport/g) - if "on_crash" is absent: - for 1.{5.0,5.1,5.2,5.3,6.0}, add -global pvpanic.ioport=0 - for other versions, do nothing - if "on_crash" is present: - for 1.{5.0,5.1,5.2,5.3,6.0}, do nothing, - for other versions, pass -device pvpanic (knowledge of 0x505 is unneeded) "advantage" and "disadvantage 1" remain, "disadvantage 2" is gone.
2b) QEMU will provide a way for libvirt to detect that no machine type has the builtin pvpanic. If some machine type may have the builtin pvpanic, and <panic-notifier/> is absent, libvirt will add "-global pvpanic.iobase=0" to neutralize it. Otherwise, libvirt will create the device normally.
A possible way for libvirt to detect "good" machine types is a dummy property. This is a bit ugly in that the property would not affect the behavior of the device. The property would remain in the long term.
Another possibility is for QEMU to rename the device, e.g. to isa-pvpanic. This is also somewhat gross, but not visible in the long term when the "pvpanic" name will be lost in history.
Advantage 1: libvirt has no knowledge of the pvpanic port number
Disadvantage 1: same as above
Disadvantage 2: need a somewhat gross change in QEMU
This method also provides an (also somewhat gross on the QEMU side) way to detect other changes in the pvpanic semantics. One example mentioned below, is making the panicked state temporary.
Too much work in qemu, in order to introduce ugliness, to hide older ugliness.
== Possible improvements to pvpanic ==
That's too complex / far out for me now, sorry :) Thanks, Laszlo

Laszlo Ersek <lersek@redhat.com> writes:
On 08/22/13 18:10, Paolo Bonzini wrote:
The thread from yesterday has died off (perhaps also because of my inappropriate answer to Michael, for which I apologize to him and everyone). I took some time to discuss the libvirt requirements further with Daniel Berrange and Eric Blake on IRC. If anyone is interested, I can give logs. This is a suggestion for how to proceed in both QEMU and libvirt.
The analysis is pretty overwhelming :)
I have read Anthony's response and I'm not trying to argue -- I'm just spending a few thoughts on this and I'm willing to let them go to waste.
In general I think we should minimize the quirks the user (who edits the libvirt XML) has to know about. Interactions between some XML elements, without explicit inter-references (formulated as attributes, like controller/index) are Bad (TM). So,
== Builtin pvpanic ==
QEMU will remove pvpanic from pc-1.5 in 1.6.1 and 1.5.4. This does not break migration.
== Support in libvirt for current functionality ==
libvirt will add a <panic-notifier/> element, and possibly a capability for it accessible via "virsh capabilities". There are two possibilities:
1) On QEMU 1.5.4/1.6.1 and newer (and on QEMU 1.6.0 with a machine type other than pc-1.5), <on_crash> will only work if the element is there. On QEMU 1.5.0->1.5.3, and on QEMU 1.6.0 with the pc-1.5 machine type, <on_crash> will be obeyed always, and may override e.g. reboot-on-panic if a guest driver exist.
I don't like this because there's some interplay between on_crash and panic_notifier, which even depends on the qemu version.
2) On all versions, <on_crash> will only work if the element is there.
I like this, because, if on_crash doesn't work without panic_notifier *at all*, then we can just drop panic_notifier, and make on_crash mean (on_crash && panic_notifier) in the original sense.
IOW, drop "panic_notifier", and make "on_crash" work *always*.
In turn, there are two ways to implement (2):
2a) libvirt will always add -global pvpanic.iobase=0 to neutralize the builtin pvpanic device if present. <panic-notifier/> will create the device with -device pvpanic,iobase=0x505
Advantage: no changes to QEMU
Disadvantage 1: writes to port 0 with QEMU 1.{5.0,5.1,5.2,5.3,6.0} and pc-1.5 machine type will write to a pvpanic device instead of the DMA controller. Probably harmless, and limited to some QEMU versions.
Disadvantage 2: libvirt has knowledge of the pvpanic port number
Updating this paragraph with my above suggestion:
- (s/pvpanic.iobase/pvpanic.ioport/g)
- if "on_crash" is absent: - for 1.{5.0,5.1,5.2,5.3,6.0}, add -global pvpanic.ioport=0 - for other versions, do nothing
- if "on_crash" is present: - for 1.{5.0,5.1,5.2,5.3,6.0}, do nothing, - for other versions, pass -device pvpanic (knowledge of 0x505 is unneeded)
Just to further complicate things... pvpanic is not going to be present on S390, PPC, ARM, or MIPS because of the fact that it's x86 specific. That means at some point there's going to have to be another device to support these platforms and libvirt will need to deal with that too. Regards, Anthony Liguori

On 22/08/13 20:33, Anthony Liguori wrote:
Laszlo Ersek <lersek@redhat.com> writes:
On 08/22/13 18:10, Paolo Bonzini wrote:
The thread from yesterday has died off (perhaps also because of my inappropriate answer to Michael, for which I apologize to him and everyone). I took some time to discuss the libvirt requirements further with Daniel Berrange and Eric Blake on IRC. If anyone is interested, I can give logs. This is a suggestion for how to proceed in both QEMU and libvirt.
The analysis is pretty overwhelming :)
I have read Anthony's response and I'm not trying to argue -- I'm just spending a few thoughts on this and I'm willing to let them go to waste.
In general I think we should minimize the quirks the user (who edits the libvirt XML) has to know about. Interactions between some XML elements, without explicit inter-references (formulated as attributes, like controller/index) are Bad (TM). So,
== Builtin pvpanic ==
QEMU will remove pvpanic from pc-1.5 in 1.6.1 and 1.5.4. This does not break migration.
== Support in libvirt for current functionality ==
libvirt will add a <panic-notifier/> element, and possibly a capability for it accessible via "virsh capabilities". There are two possibilities:
1) On QEMU 1.5.4/1.6.1 and newer (and on QEMU 1.6.0 with a machine type other than pc-1.5), <on_crash> will only work if the element is there. On QEMU 1.5.0->1.5.3, and on QEMU 1.6.0 with the pc-1.5 machine type, <on_crash> will be obeyed always, and may override e.g. reboot-on-panic if a guest driver exist.
I don't like this because there's some interplay between on_crash and panic_notifier, which even depends on the qemu version.
2) On all versions, <on_crash> will only work if the element is there.
I like this, because, if on_crash doesn't work without panic_notifier *at all*, then we can just drop panic_notifier, and make on_crash mean (on_crash && panic_notifier) in the original sense.
IOW, drop "panic_notifier", and make "on_crash" work *always*.
In turn, there are two ways to implement (2):
2a) libvirt will always add -global pvpanic.iobase=0 to neutralize the builtin pvpanic device if present. <panic-notifier/> will create the device with -device pvpanic,iobase=0x505
Advantage: no changes to QEMU
Disadvantage 1: writes to port 0 with QEMU 1.{5.0,5.1,5.2,5.3,6.0} and pc-1.5 machine type will write to a pvpanic device instead of the DMA controller. Probably harmless, and limited to some QEMU versions.
Disadvantage 2: libvirt has knowledge of the pvpanic port number
Updating this paragraph with my above suggestion:
- (s/pvpanic.iobase/pvpanic.ioport/g)
- if "on_crash" is absent: - for 1.{5.0,5.1,5.2,5.3,6.0}, add -global pvpanic.ioport=0 - for other versions, do nothing
- if "on_crash" is present: - for 1.{5.0,5.1,5.2,5.3,6.0}, do nothing, - for other versions, pass -device pvpanic (knowledge of 0x505 is unneeded)
Just to further complicate things...
pvpanic is not going to be present on S390, PPC, ARM, or MIPS because of the fact that it's x86 specific.
What about crashed state? I have implemented this state after the guest entered disabled wait (by convention used by all s390 OSes for crashes) See commit 08eb8c85e3967b97865d46acadf26dc908fbb094 Author: Christian Borntraeger <borntraeger@de.ibm.com> Date: Fri Apr 26 11:24:47 2013 +0800 Wire up disabled wait a panicked event on s390 Should we remove that as well? From s390 point of view, after allowing the crashed->running transition the feature would probably work as expected. Christian

Il 22/08/2013 19:15, Laszlo Ersek ha scritto:
2) On all versions, <on_crash> will only work if the element is there.
I like this, because, if on_crash doesn't work without panic_notifier *at all*, then we can just drop panic_notifier, and make on_crash mean (on_crash && panic_notifier) in the original sense.
IOW, drop "panic_notifier", and make "on_crash" work *always*.
No, we cannot because of backwards compatibility. VMs could have no on_crash element (which means <on_crash>destroy</on_crash>) and yet the guest admin could expect them to reboot on panic.
2b) QEMU will provide a way for libvirt to detect that no machine type has the builtin pvpanic. If some machine type may have the builtin pvpanic, and <panic-notifier/> is absent, libvirt will add "-global pvpanic.iobase=0" to neutralize it. Otherwise, libvirt will create the device normally.
A possible way for libvirt to detect "good" machine types is a dummy property. This is a bit ugly in that the property would not affect the behavior of the device. The property would remain in the long term.
Another possibility is for QEMU to rename the device, e.g. to isa-pvpanic. This is also somewhat gross, but not visible in the long term when the "pvpanic" name will be lost in history.
Advantage 1: libvirt has no knowledge of the pvpanic port number
Disadvantage 1: same as above
Disadvantage 2: need a somewhat gross change in QEMU
This method also provides an (also somewhat gross on the QEMU side) way to detect other changes in the pvpanic semantics. One example mentioned below, is making the panicked state temporary.
Too much work in qemu, in order to introduce ugliness, to hide older ugliness.
Is it too much work? s/"pvpanic"/"isa-pvpanic"? Paolo

On 08/22/13 21:19, Paolo Bonzini wrote:
Il 22/08/2013 19:15, Laszlo Ersek ha scritto:
2) On all versions, <on_crash> will only work if the element is there.
I like this, because, if on_crash doesn't work without panic_notifier *at all*, then we can just drop panic_notifier, and make on_crash mean (on_crash && panic_notifier) in the original sense.
IOW, drop "panic_notifier", and make "on_crash" work *always*.
No, we cannot because of backwards compatibility. VMs could have no on_crash element (which means <on_crash>destroy</on_crash>) and yet the guest admin could expect them to reboot on panic.
Ah. I thought "no on_crash" meant <on_crash>ignore</on_crash>, or something like that -- if on_crash was absent, the guest wouldn't see a working pvpanic device in ACPI, and wouldn't trigger the event in qemu.
2b) QEMU will provide a way for libvirt to detect that no machine type has the builtin pvpanic. If some machine type may have the builtin pvpanic, and <panic-notifier/> is absent, libvirt will add "-global pvpanic.iobase=0" to neutralize it. Otherwise, libvirt will create the device normally.
A possible way for libvirt to detect "good" machine types is a dummy property. This is a bit ugly in that the property would not affect the behavior of the device. The property would remain in the long term.
Another possibility is for QEMU to rename the device, e.g. to isa-pvpanic. This is also somewhat gross, but not visible in the long term when the "pvpanic" name will be lost in history.
Advantage 1: libvirt has no knowledge of the pvpanic port number
Disadvantage 1: same as above
Disadvantage 2: need a somewhat gross change in QEMU
This method also provides an (also somewhat gross on the QEMU side) way to detect other changes in the pvpanic semantics. One example mentioned below, is making the panicked state temporary.
Too much work in qemu, in order to introduce ugliness, to hide older ugliness.
Is it too much work? s/"pvpanic"/"isa-pvpanic"?
... I probably skipped the rename option because you called it gross (and maybe because I (erroneously?) recall Michael's opposition). I think I meant the dummy property under "too much work" (it may not be, in retrospect, but properties always imply compat stuff for me, and *that* is scary). Laszlo

On 08/22/2013 01:41 PM, Laszlo Ersek wrote:
On 08/22/13 21:19, Paolo Bonzini wrote:
Il 22/08/2013 19:15, Laszlo Ersek ha scritto:
2) On all versions, <on_crash> will only work if the element is there.
I like this, because, if on_crash doesn't work without panic_notifier *at all*, then we can just drop panic_notifier, and make on_crash mean (on_crash && panic_notifier) in the original sense.
IOW, drop "panic_notifier", and make "on_crash" work *always*.
No, we cannot because of backwards compatibility. VMs could have no on_crash element (which means <on_crash>destroy</on_crash>) and yet the guest admin could expect them to reboot on panic.
Ah. I thought "no on_crash" meant <on_crash>ignore</on_crash>, or something like that -- if on_crash was absent, the guest wouldn't see a working pvpanic device in ACPI, and wouldn't trigger the event in qemu.
Unfortunately, <on_crash>ignore</on_crash> does not exist in current libvirt codebase, and <on_crash> is always present on output (if omitted on input, it is present as <on_crash>destroy</on_crash> on output; but MOST vms have it as <on_crash>restart</on_crash> thanks to virt-install's defaults). In short, libvirt's problem is that older libvirt basically ignored the setting (whether default of destroy or set by virt-manager to restart), BOTH of those common options are most sensibly implemented by having a panic device, but adding a panic device is guest visible, and therefore must be controlled by some NEW piece of XML. If we add <on_crash>ignore</on_crash, and teach virt-install to start using it, that will help new guests, but won't change the problem for existing guests. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hi All, I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this? Thanks.

Note: CC list restricted to @redhat.com. On 10/29/13 17:01, Markus Armbruster wrote:
Ping!
Hu Tao <hutao@cn.fujitsu.com> writes:
Hi All,
I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this?
IIRC Anthony wanted a brand new virtio device and no ties to ACPI or x86. Objections included (I think) that writing to a virtio device when panicked is more risky than an outb(). Feel free to re-read the thread at <http://thread.gmane.org/gmane.comp.emulators.qemu/229289>... However, please remember Ronen's email -- I'm attaching it. Thanks Laszlo

On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
Hi All,
I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this?
Thanks.
I think the biggest issue is the new PANICKED state. Guests already have simple ways to halt the CPU, and actually do. I think a new state was a mistake. So how about the following? Does it break anything? (Untested). Signed-off-by: Michael S. Tsirkin <mst@redhat.com> diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index 226e298..2055afc 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -51,7 +51,6 @@ static void handle_event(int event) if (event & PVPANIC_PANICKED) { panicked_mon_event("pause"); - vm_stop(RUN_STATE_GUEST_PANICKED); return; } }

On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
Hi All,
I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this?
Thanks.
I think the biggest issue is the new PANICKED state. Guests already have simple ways to halt the CPU, and actually do. I think a new state was a mistake. So how about the following? Does it break anything? (Untested).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index 226e298..2055afc 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -51,7 +51,6 @@ static void handle_event(int event)
if (event & PVPANIC_PANICKED) { panicked_mon_event("pause"); - vm_stop(RUN_STATE_GUEST_PANICKED);
Don't you still need to halt the guest on a panic event, for management to have a chance to choose what to do about the panic? I'm suspecting this patch does break things. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Oct 31, 2013 at 3:32 PM, Eric Blake <eblake@redhat.com> wrote:
On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
Hi All,
I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this?
Thanks.
I think the biggest issue is the new PANICKED state. Guests already have simple ways to halt the CPU, and actually do. I think a new state was a mistake. So how about the following? Does it break anything? (Untested).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index 226e298..2055afc 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -51,7 +51,6 @@ static void handle_event(int event)
if (event & PVPANIC_PANICKED) { panicked_mon_event("pause"); - vm_stop(RUN_STATE_GUEST_PANICKED);
Don't you still need to halt the guest on a panic event, for management to have a chance to choose what to do about the panic? I'm suspecting this patch does break things.
I would be happy to apply a patch that just reverted the whole dang mess of this device. Regards, Anthony Liguori
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Il 31/10/2013 15:32, Eric Blake ha scritto:
On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
Hi All,
I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this?
I think the biggest issue is the new PANICKED state. Guests already have simple ways to halt the CPU, and actually do. I think a new state was a mistake. So how about the following? Does it break anything? (Untested).
Don't you still need to halt the guest on a panic event, for management to have a chance to choose what to do about the panic? I'm suspecting this patch does break things.
Yes, it does. But I think that, once we make the pvpanic device is optional, to a large extent there is no bug. Adding the pvpanic device to the VM will make libvirt obey <oncrash> instead of the in-guest setting, and that's it. Two months have passed and no casualties have been reported due to pvpanic. Let's just remove the auto-pvpanic from all machine types in 1.7 (yes, that's backwards incompatible in a strict sense), document it in the release notes, and hope that the old QEMU versions with mandatory pvpanic die of old age. All the advantages/disadvantages from my original messages still apply. Let's ignore the disadvantages and just KISS. Paolo -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJScmuVAAoJEBvWZb6bTYbyS6MP/Rb9japkUmjKy+bUW2Bf5vHD hOrCL/20LC17OdIdzOngjfcY6OyPrtZ6dtjzvSjsWoxjW+QNuTwLM5h+0kOmvlM/ UCqX9MZiKFysVphnwIFy2fzTKmAVw5WqUlS17XMsiKxpvCquy5Fss5PWB3k3orMD kRCoTTmQDID95T38OqezvzDaRjNvoHXOTuWFOhbzWt2OFUwg1WDy1GLKV/pSx0/o GS2/RYo5iQkEtZk0muj0fmWEZX1K2nfaYmVQh39LJhFrWQLabHskYz1+n/OTmwQT 3DiUkfeqrxLahlyHbIOguLjLY8gH2fZa0gzHEB7D0b1aiuqQmsxM3ELdl8pJJzex 1ZGKt9X1u58v/SvfDHW14uCMAiv8jgQDnDOa6pgi7DezBQym+90+RnzC2Z5BcHIp hsPEB0Bc47REPu69GOSj7XQ1uan5yQS38jSv8D11nJEW8VfXiV4smOZhivrEOicR mdYYV6BNc985vcVOmvmkTJ5VkUOKeMzMDAJkSDqN6P0fQKTOJpCtJag3TcjHVRB4 ORXUrdO9NuICGjwB75T86INkxEsXFaw1aHIKpcxMk2PN6u/Zc+n7GXf65ReGorbL 2QUrepzUUt+mTYXJha9h7gEudRnixe5zK8AgqtSHAJPKP++LrFz4lrmzCHC6e3V1 6K2AhZl4EjBJaf6AMy70 =GBBt -----END PGP SIGNATURE-----

On Thu, Oct 31, 2013 at 03:39:17PM +0100, Paolo Bonzini wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Il 31/10/2013 15:32, Eric Blake ha scritto:
On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
Hi All,
I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this?
I think the biggest issue is the new PANICKED state. Guests already have simple ways to halt the CPU, and actually do. I think a new state was a mistake. So how about the following? Does it break anything? (Untested).
Don't you still need to halt the guest on a panic event, for management to have a chance to choose what to do about the panic? I'm suspecting this patch does break things.
Yes, it does.
What does it break exactly?
But I think that, once we make the pvpanic device is optional, to a large extent there is no bug. Adding the pvpanic device to the VM will make libvirt obey <oncrash> instead of the in-guest setting, and that's it.
Two months have passed and no casualties have been reported due to pvpanic. Let's just remove the auto-pvpanic from all machine types in 1.7 (yes, that's backwards incompatible in a strict sense), document it in the release notes, and hope that the old QEMU versions with mandatory pvpanic die of old age.
Nod. I'm fine with that.
All the advantages/disadvantages from my original messages still apply. Let's ignore the disadvantages and just KISS.
Paolo
I think we still need to do get rid of the PANICKED state somehow. If we can't replace it with RUNNING state, let's replace it with PAUSED. For example, you can't continue from panicked for some reason. You can't do a reset. But you can pause and then continue.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBAgAGBQJScmuVAAoJEBvWZb6bTYbyS6MP/Rb9japkUmjKy+bUW2Bf5vHD hOrCL/20LC17OdIdzOngjfcY6OyPrtZ6dtjzvSjsWoxjW+QNuTwLM5h+0kOmvlM/ UCqX9MZiKFysVphnwIFy2fzTKmAVw5WqUlS17XMsiKxpvCquy5Fss5PWB3k3orMD kRCoTTmQDID95T38OqezvzDaRjNvoHXOTuWFOhbzWt2OFUwg1WDy1GLKV/pSx0/o GS2/RYo5iQkEtZk0muj0fmWEZX1K2nfaYmVQh39LJhFrWQLabHskYz1+n/OTmwQT 3DiUkfeqrxLahlyHbIOguLjLY8gH2fZa0gzHEB7D0b1aiuqQmsxM3ELdl8pJJzex 1ZGKt9X1u58v/SvfDHW14uCMAiv8jgQDnDOa6pgi7DezBQym+90+RnzC2Z5BcHIp hsPEB0Bc47REPu69GOSj7XQ1uan5yQS38jSv8D11nJEW8VfXiV4smOZhivrEOicR mdYYV6BNc985vcVOmvmkTJ5VkUOKeMzMDAJkSDqN6P0fQKTOJpCtJag3TcjHVRB4 ORXUrdO9NuICGjwB75T86INkxEsXFaw1aHIKpcxMk2PN6u/Zc+n7GXf65ReGorbL 2QUrepzUUt+mTYXJha9h7gEudRnixe5zK8AgqtSHAJPKP++LrFz4lrmzCHC6e3V1 6K2AhZl4EjBJaf6AMy70 =GBBt -----END PGP SIGNATURE-----

Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
Yes, it does. What does it break exactly?
The point of a panicked event is to examine the guest at a particular moment in time (e.g. host-initiated crash dump). If you let the guest run, it may reboot and prevent you from getting a meaningful dump.
But I think that, once we make the pvpanic device is optional, to a large extent there is no bug. Adding the pvpanic device to the VM will make libvirt obey <oncrash> instead of the in-guest setting, and that's it.
Two months have passed and no casualties have been reported due to pvpanic. Let's just remove the auto-pvpanic from all machine types in 1.7 (yes, that's backwards incompatible in a strict sense), document it in the release notes, and hope that the old QEMU versions with mandatory pvpanic die of old age.
Nod. I'm fine with that.
I think we still need to do get rid of the PANICKED state somehow. If we can't replace it with RUNNING state, let's replace it with PAUSED.
For example, you can't continue from panicked for some reason. You can't do a reset. But you can pause and then continue.
We need to keep the PANICKED state, but we can make it a normal "resumable" state. Basically it's patches 1 and 2 at http://permalink.gmane.org/gmane.comp.emulators.qemu/229131. Rebasing will fix the problem highlighted in the commit message of patch 2. Paolo

On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
Yes, it does. What does it break exactly?
The point of a panicked event is to examine the guest at a particular moment in time (e.g. host-initiated crash dump). If you let the guest run, it may reboot and prevent you from getting a meaningful dump.
Well we trust guest anyway, so I think we can trust it to call halt.
But I think that, once we make the pvpanic device is optional, to a large extent there is no bug. Adding the pvpanic device to the VM will make libvirt obey <oncrash> instead of the in-guest setting, and that's it.
Two months have passed and no casualties have been reported due to pvpanic. Let's just remove the auto-pvpanic from all machine types in 1.7 (yes, that's backwards incompatible in a strict sense), document it in the release notes, and hope that the old QEMU versions with mandatory pvpanic die of old age.
Nod. I'm fine with that.
I think we still need to do get rid of the PANICKED state somehow. If we can't replace it with RUNNING state, let's replace it with PAUSED.
For example, you can't continue from panicked for some reason. You can't do a reset. But you can pause and then continue.
We need to keep the PANICKED state, but we can make it a normal "resumable" state.
If it's resumable how is it different from PAUSED?
Basically it's patches 1 and 2 at http://permalink.gmane.org/gmane.comp.emulators.qemu/229131. Rebasing will fix the problem highlighted in the commit message of patch 2.
Paolo
Looks like all transitions from paused state should be allowed from panicked state. So why keep it separate?

Il 31/10/2013 16:09, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
Yes, it does. What does it break exactly?
The point of a panicked event is to examine the guest at a particular moment in time (e.g. host-initiated crash dump). If you let the guest run, it may reboot and prevent you from getting a meaningful dump.
Well we trust guest anyway, so I think we can trust it to call halt.
No, we cannot. Reboot-in-guest-after-dump-on-host is a perfectly fine configuration.
But I think that, once we make the pvpanic device is optional, to a large extent there is no bug. Adding the pvpanic device to the VM will make libvirt obey <oncrash> instead of the in-guest setting, and that's it.
Two months have passed and no casualties have been reported due to pvpanic. Let's just remove the auto-pvpanic from all machine types in 1.7 (yes, that's backwards incompatible in a strict sense), document it in the release notes, and hope that the old QEMU versions with mandatory pvpanic die of old age.
Nod. I'm fine with that.
I think we still need to do get rid of the PANICKED state somehow. If we can't replace it with RUNNING state, let's replace it with PAUSED.
For example, you can't continue from panicked for some reason. You can't do a reset. But you can pause and then continue.
We need to keep the PANICKED state, but we can make it a normal "resumable" state.
If it's resumable how is it different from PAUSED?
If the guest panics while for some reason libvirtd went down, libvirt can see what happened. It is similar to the "I/O error" state in this respect.
Looks like all transitions from paused state should be allowed from panicked state. So why keep it separate?
Because you can poll for the state instead of watching an event. Paolo

On Thu, Oct 31, 2013 at 04:26:13PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 16:09, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
Yes, it does. What does it break exactly?
The point of a panicked event is to examine the guest at a particular moment in time (e.g. host-initiated crash dump). If you let the guest run, it may reboot and prevent you from getting a meaningful dump.
Well we trust guest anyway, so I think we can trust it to call halt.
No, we cannot. Reboot-in-guest-after-dump-on-host is a perfectly fine configuration.
But I think that, once we make the pvpanic device is optional, to a large extent there is no bug. Adding the pvpanic device to the VM will make libvirt obey <oncrash> instead of the in-guest setting, and that's it.
Two months have passed and no casualties have been reported due to pvpanic. Let's just remove the auto-pvpanic from all machine types in 1.7 (yes, that's backwards incompatible in a strict sense), document it in the release notes, and hope that the old QEMU versions with mandatory pvpanic die of old age.
Nod. I'm fine with that.
I think we still need to do get rid of the PANICKED state somehow. If we can't replace it with RUNNING state, let's replace it with PAUSED.
For example, you can't continue from panicked for some reason. You can't do a reset. But you can pause and then continue.
We need to keep the PANICKED state, but we can make it a normal "resumable" state.
If it's resumable how is it different from PAUSED?
If the guest panics while for some reason libvirtd went down, libvirt can see what happened. It is similar to the "I/O error" state in this respect.
Looks like all transitions from paused state should be allowed from panicked state. So why keep it separate?
Because you can poll for the state instead of watching an event.
Paolo
I see. Maybe it was a mistake to use a separate runtime state for this, but oh well. So it should be exactly like paused, we can just find all transitions from PAUSED and it should be same for PANICKED? Why isn't DEBUG allowed from PAUSED but allowed from PANICKED then? Maybe it should be allowed for PAUSED? -- MST

Il 31/10/2013 16:45, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 04:26:13PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 16:09, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
> Yes, it does. What does it break exactly?
The point of a panicked event is to examine the guest at a particular moment in time (e.g. host-initiated crash dump). If you let the guest run, it may reboot and prevent you from getting a meaningful dump.
Well we trust guest anyway, so I think we can trust it to call halt.
No, we cannot. Reboot-in-guest-after-dump-on-host is a perfectly fine configuration.
> But I think that, once we make the pvpanic device is > optional, to a large extent there is no bug. Adding the pvpanic > device to the VM will make libvirt obey <oncrash> instead of the > in-guest setting, and that's it. > > Two months have passed and no casualties have been reported due to > pvpanic. Let's just remove the auto-pvpanic from all machine types in > 1.7 (yes, that's backwards incompatible in a strict sense), document > it in the release notes, and hope that the old QEMU versions with > mandatory pvpanic die of old age.
Nod. I'm fine with that.
I think we still need to do get rid of the PANICKED state somehow. If we can't replace it with RUNNING state, let's replace it with PAUSED.
For example, you can't continue from panicked for some reason. You can't do a reset. But you can pause and then continue.
We need to keep the PANICKED state, but we can make it a normal "resumable" state.
If it's resumable how is it different from PAUSED?
If the guest panics while for some reason libvirtd went down, libvirt can see what happened. It is similar to the "I/O error" state in this respect.
Looks like all transitions from paused state should be allowed from panicked state. So why keep it separate?
Because you can poll for the state instead of watching an event.
I see. Maybe it was a mistake to use a separate runtime state for this, but oh well.
Yes, we should have had a list of "reasons" why a guest is stopped (I/O error, panic, gdb, ...) and a command to clear one or more of them; there can be paused/running/waiting-for-migration/... states, but many of the states we have are not necessarily in mutual exclusion. But we cannot really choose now.
So it should be exactly like paused, we can just find all transitions from PAUSED and it should be same for PANICKED? Why isn't DEBUG allowed from PAUSED but allowed from PANICKED then? Maybe it should be allowed for PAUSED?
PANICKED->DEBUG was added by commit bc7d0e667. That commit can be reverted if the panicked state is removed from runstate_needs_reset. Paolo

On Thu, Oct 31, 2013 at 04:56:07PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 16:45, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 04:26:13PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 16:09, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 03:56:42PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 15:52, Michael S. Tsirkin ha scritto:
>> Yes, it does. What does it break exactly?
The point of a panicked event is to examine the guest at a particular moment in time (e.g. host-initiated crash dump). If you let the guest run, it may reboot and prevent you from getting a meaningful dump.
Well we trust guest anyway, so I think we can trust it to call halt.
No, we cannot. Reboot-in-guest-after-dump-on-host is a perfectly fine configuration.
>> But I think that, once we make the pvpanic device is >> optional, to a large extent there is no bug. Adding the pvpanic >> device to the VM will make libvirt obey <oncrash> instead of the >> in-guest setting, and that's it. >> >> Two months have passed and no casualties have been reported due to >> pvpanic. Let's just remove the auto-pvpanic from all machine types in >> 1.7 (yes, that's backwards incompatible in a strict sense), document >> it in the release notes, and hope that the old QEMU versions with >> mandatory pvpanic die of old age.
Nod. I'm fine with that.
I think we still need to do get rid of the PANICKED state somehow. If we can't replace it with RUNNING state, let's replace it with PAUSED.
For example, you can't continue from panicked for some reason. You can't do a reset. But you can pause and then continue.
We need to keep the PANICKED state, but we can make it a normal "resumable" state.
If it's resumable how is it different from PAUSED?
If the guest panics while for some reason libvirtd went down, libvirt can see what happened. It is similar to the "I/O error" state in this respect.
Looks like all transitions from paused state should be allowed from panicked state. So why keep it separate?
Because you can poll for the state instead of watching an event.
I see. Maybe it was a mistake to use a separate runtime state for this, but oh well.
Yes, we should have had a list of "reasons" why a guest is stopped (I/O error, panic, gdb, ...) and a command to clear one or more of them; there can be paused/running/waiting-for-migration/... states, but many of the states we have are not necessarily in mutual exclusion.
But we cannot really choose now.
So it should be exactly like paused, we can just find all transitions from PAUSED and it should be same for PANICKED? Why isn't DEBUG allowed from PAUSED but allowed from PANICKED then? Maybe it should be allowed for PAUSED?
PANICKED->DEBUG was added by commit bc7d0e667. That commit can be reverted if the panicked state is removed from runstate_needs_reset.
Paolo
Okay so let's drop the code duplication and explicitly make them the same? Signed-off-by: Michael S. Tsirkin <mst@redhat.com> diff --git a/vl.c b/vl.c index 46c29c4..e12d317 100644 --- a/vl.c +++ b/vl.c @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, - { RUN_STATE_MAX, RUN_STATE_MAX }, }; @@ -660,6 +656,12 @@ static void runstate_init(void) for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) { runstate_valid_transitions[p->from][p->to] = true; + /* Panicked state is same as paused, we only made it different so + * management can detect a panic. + */ + if (p->from == RUN_STATE_PAUSED) { + runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true; + } } } @@ -686,8 +688,7 @@ int runstate_is_running(void) bool runstate_needs_reset(void) { return runstate_check(RUN_STATE_INTERNAL_ERROR) || - runstate_check(RUN_STATE_SHUTDOWN) || - runstate_check(RUN_STATE_GUEST_PANICKED); + runstate_check(RUN_STATE_SHUTDOWN); } StatusInfo *qmp_query_status(Error **errp)

Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
PANICKED->DEBUG was added by commit bc7d0e667. That commit can be reverted if the panicked state is removed from runstate_needs_reset.
Okay so let's drop the code duplication and explicitly make them the same?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/vl.c b/vl.c index 46c29c4..e12d317 100644 --- a/vl.c +++ b/vl.c @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, - { RUN_STATE_MAX, RUN_STATE_MAX }, };
@@ -660,6 +656,12 @@ static void runstate_init(void)
for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) { runstate_valid_transitions[p->from][p->to] = true; + /* Panicked state is same as paused, we only made it different so + * management can detect a panic. + */ + if (p->from == RUN_STATE_PAUSED) { + runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
It makes only sense to me if you do that for IO_ERROR and WATCHDOG as well, and perhaps there are others I'm missing. Just add a comment before runstate_transitions_def's entries for PANICKED, IO_ERROR and WATCHDOG. But again, it is somewhat separate from the issue at hand, which is to finally make pvpanic usable and hopefully before 1.7. Paolo
+ } } }
@@ -686,8 +688,7 @@ int runstate_is_running(void) bool runstate_needs_reset(void) { return runstate_check(RUN_STATE_INTERNAL_ERROR) || - runstate_check(RUN_STATE_SHUTDOWN) || - runstate_check(RUN_STATE_GUEST_PANICKED); + runstate_check(RUN_STATE_SHUTDOWN); }
StatusInfo *qmp_query_status(Error **errp)

On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
PANICKED->DEBUG was added by commit bc7d0e667. That commit can be reverted if the panicked state is removed from runstate_needs_reset.
Okay so let's drop the code duplication and explicitly make them the same?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/vl.c b/vl.c index 46c29c4..e12d317 100644 --- a/vl.c +++ b/vl.c @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, - { RUN_STATE_MAX, RUN_STATE_MAX }, };
@@ -660,6 +656,12 @@ static void runstate_init(void)
for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) { runstate_valid_transitions[p->from][p->to] = true; + /* Panicked state is same as paused, we only made it different so + * management can detect a panic. + */ + if (p->from == RUN_STATE_PAUSED) { + runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
It makes only sense to me if you do that for IO_ERROR and WATCHDOG as well, and perhaps there are others I'm missing. Just add a comment before runstate_transitions_def's entries for PANICKED, IO_ERROR and WATCHDOG.
But again, it is somewhat separate from the issue at hand, which is to finally make pvpanic usable and hopefully before 1.7.
Paolo
The issue is that you can't continue from panicked state. You should be able to do that without going through paused.
+ } } }
@@ -686,8 +688,7 @@ int runstate_is_running(void) bool runstate_needs_reset(void) { return runstate_check(RUN_STATE_INTERNAL_ERROR) || - runstate_check(RUN_STATE_SHUTDOWN) || - runstate_check(RUN_STATE_GUEST_PANICKED); + runstate_check(RUN_STATE_SHUTDOWN); }
StatusInfo *qmp_query_status(Error **errp)

Il 31/10/2013 17:26, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
PANICKED->DEBUG was added by commit bc7d0e667. That commit can be reverted if the panicked state is removed from runstate_needs_reset.
Okay so let's drop the code duplication and explicitly make them the same?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/vl.c b/vl.c index 46c29c4..e12d317 100644 --- a/vl.c +++ b/vl.c @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, - { RUN_STATE_MAX, RUN_STATE_MAX }, };
@@ -660,6 +656,12 @@ static void runstate_init(void)
for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) { runstate_valid_transitions[p->from][p->to] = true; + /* Panicked state is same as paused, we only made it different so + * management can detect a panic. + */ + if (p->from == RUN_STATE_PAUSED) { + runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
It makes only sense to me if you do that for IO_ERROR and WATCHDOG as well, and perhaps there are others I'm missing. Just add a comment before runstate_transitions_def's entries for PANICKED, IO_ERROR and WATCHDOG.
But again, it is somewhat separate from the issue at hand, which is to finally make pvpanic usable and hopefully before 1.7.
Paolo
The issue is that you can't continue from panicked state. You should be able to do that without going through paused.
Yes, that's what my patch (posted the link before) does: - { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, + { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, Comments don't compile, but are also easier to understand than code. Special logic in runstate_init is unnecessarily complicated, for a table that hardly sees any change. English works better, whoever modifies the table has it under their eyes. Paolo

On Thu, Oct 31, 2013 at 05:38:40PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 17:26, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
PANICKED->DEBUG was added by commit bc7d0e667. That commit can be reverted if the panicked state is removed from runstate_needs_reset.
Okay so let's drop the code duplication and explicitly make them the same?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/vl.c b/vl.c index 46c29c4..e12d317 100644 --- a/vl.c +++ b/vl.c @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, - { RUN_STATE_MAX, RUN_STATE_MAX }, };
@@ -660,6 +656,12 @@ static void runstate_init(void)
for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) { runstate_valid_transitions[p->from][p->to] = true; + /* Panicked state is same as paused, we only made it different so + * management can detect a panic. + */ + if (p->from == RUN_STATE_PAUSED) { + runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
It makes only sense to me if you do that for IO_ERROR and WATCHDOG as well, and perhaps there are others I'm missing. Just add a comment before runstate_transitions_def's entries for PANICKED, IO_ERROR and WATCHDOG.
But again, it is somewhat separate from the issue at hand, which is to finally make pvpanic usable and hopefully before 1.7.
Paolo
The issue is that you can't continue from panicked state. You should be able to do that without going through paused.
Yes, that's what my patch (posted the link before) does:
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, + { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
Comments don't compile, but are also easier to understand than code. Special logic in runstate_init is unnecessarily complicated, for a table that hardly sees any change. English works better, whoever modifies the table has it under their eyes.
Paolo
But code duplication is bad. I think IO error for example is broken in that you can't pause but can run then pause. Seems strange. Internal error has same bug as panicked. So it's the same bug for a bunch of states, let's just have a way to say "this is same as paused". How's this? diff --git a/vl.c b/vl.c index 46c29c4..4388c95 100644 --- a/vl.c +++ b/vl.c @@ -593,12 +593,6 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING }, { RUN_STATE_INMIGRATE, RUN_STATE_PAUSED }, - { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED }, - { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE }, - - { RUN_STATE_IO_ERROR, RUN_STATE_RUNNING }, - { RUN_STATE_IO_ERROR, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_PAUSED, RUN_STATE_RUNNING }, { RUN_STATE_PAUSED, RUN_STATE_FINISH_MIGRATE }, @@ -635,16 +629,17 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, - { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, - - { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, - { RUN_STATE_MAX, RUN_STATE_MAX }, }; +static const RunState runstate_paused[] = { + { RUN_STATE_GUEST_PANICKED}, + { RUN_STATE_IO_ERROR}, + { RUN_STATE_INTERNAL_ERROR}, + { RUN_STATE_WATCHDOG}, + { RUN_STATE_MAX }, +}; + static bool runstate_valid_transitions[RUN_STATE_MAX][RUN_STATE_MAX]; bool runstate_check(RunState state) @@ -655,12 +650,21 @@ bool runstate_check(RunState state) static void runstate_init(void) { const RunStateTransition *p; + const RunState *i; memset(&runstate_valid_transitions, 0, sizeof(runstate_valid_transitions)); for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) { runstate_valid_transitions[p->from][p->to] = true; } + /* Allow two-way transitions between identical states */ + for (i = &runstate_paused[0]; *p != RUN_STATE_MAX; p++) { + runstate_valid_transitions[*i][RUN_STATE_PAUSED] = true; + runstate_valid_transitions[RUN_STATE_PAUSED][*i] = true; + memcpy(&runstate_valid_transitions[*i], + &runstate_valid_transitions[RUN_STATE_PAUSED], + sizeof(runstate_valid_transitions[RUN_STATE_PAUSED])); + } } /* This function will abort() on invalid state transitions */ @@ -686,8 +690,6 @@ int runstate_is_running(void) bool runstate_needs_reset(void) { - return runstate_check(RUN_STATE_INTERNAL_ERROR) || - runstate_check(RUN_STATE_SHUTDOWN) || - runstate_check(RUN_STATE_GUEST_PANICKED); + return runstate_check(RUN_STATE_SHUTDOWN); } StatusInfo *qmp_query_status(Error **errp)

Il 31/10/2013 17:48, Michael S. Tsirkin ha scritto:
But code duplication is bad.
So should we make a table of IO_ERROR-like states to avoid code duplication? You have to draw a line somewhere...
I think IO error for example is broken in that you can't pause but can run then pause. Seems strange.
"cont" moves you out of IO_ERROR. IO_ERROR is already a non-running state (all states except RUNNING are non-running), "stop" is a no-op in non-running states. I don't like it that much either, but it works. Paolo

On Thu, Oct 31, 2013 at 05:52:11PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 17:48, Michael S. Tsirkin ha scritto:
But code duplication is bad.
So should we make a table of IO_ERROR-like states to avoid code duplication? You have to draw a line somewhere...
I think IO error for example is broken in that you can't pause but can run then pause. Seems strange.
"cont" moves you out of IO_ERROR. IO_ERROR is already a non-running state (all states except RUNNING are non-running), "stop" is a no-op in non-running states. I don't like it that much either, but it works.
Paolo
Interesting. Why do we have - { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED }, then?

Il 31/10/2013 18:00, Michael S. Tsirkin ha scritto:
Interesting. Why do we have - { RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED }, then?
It's only for non-resumable states (such as pvpanic right now). It's used here: if (qemu_reset_requested()) { pause_all_vcpus(); cpu_synchronize_all_states(); qemu_system_reset(VMRESET_REPORT); resume_all_vcpus(); if (runstate_needs_reset()) { runstate_set(RUN_STATE_PAUSED); } } Don't ask me what's happening with that resume_all_vcpus, because I have no idea. But I tested it now, and "system_reset" will indeed move you from "paused (internal-error)" to "paused" with RIP=0xfffffff0. Paolo

On Thu, Oct 31, 2013 at 05:38:40PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 17:26, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
PANICKED->DEBUG was added by commit bc7d0e667. That commit can be reverted if the panicked state is removed from runstate_needs_reset.
Okay so let's drop the code duplication and explicitly make them the same?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/vl.c b/vl.c index 46c29c4..e12d317 100644 --- a/vl.c +++ b/vl.c @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, - { RUN_STATE_MAX, RUN_STATE_MAX }, };
@@ -660,6 +656,12 @@ static void runstate_init(void)
for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) { runstate_valid_transitions[p->from][p->to] = true; + /* Panicked state is same as paused, we only made it different so + * management can detect a panic. + */ + if (p->from == RUN_STATE_PAUSED) { + runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
It makes only sense to me if you do that for IO_ERROR and WATCHDOG as well, and perhaps there are others I'm missing. Just add a comment before runstate_transitions_def's entries for PANICKED, IO_ERROR and WATCHDOG.
But again, it is somewhat separate from the issue at hand, which is to finally make pvpanic usable and hopefully before 1.7.
Paolo
The issue is that you can't continue from panicked state. You should be able to do that without going through paused.
Yes, that's what my patch (posted the link before) does:
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, + { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
Anyway I agree with your patch. How about we drop RUN_STATE_DEBUG and drop RUN_STATE_GUEST_PANICKED from need reset list?
Comments don't compile, but are also easier to understand than code. Special logic in runstate_init is unnecessarily complicated, for a table that hardly sees any change. English works better, whoever modifies the table has it under their eyes.
Paolo

Il 31/10/2013 18:01, Michael S. Tsirkin ha scritto:
Yes, that's what my patch (posted the link before) does:
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, + { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
Anyway I agree with your patch. How about we drop RUN_STATE_DEBUG and drop RUN_STATE_GUEST_PANICKED from need reset list?
Yes, and also modify gdbstub.c. It's all in the URL I posted a few hours ago. Paolo

On Thu, Oct 31, 2013 at 06:10:36PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 18:01, Michael S. Tsirkin ha scritto:
Yes, that's what my patch (posted the link before) does:
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, + { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
Anyway I agree with your patch. How about we drop RUN_STATE_DEBUG and drop RUN_STATE_GUEST_PANICKED from need reset list?
Yes, and also modify gdbstub.c. It's all in the URL I posted a few hours ago.
Paolo
OK, so can you pls post patches 1 and 2? I'll review and ack.

Il 31/10/2013 18:18, Michael S. Tsirkin ha scritto:
On Thu, Oct 31, 2013 at 06:10:36PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 18:01, Michael S. Tsirkin ha scritto:
Yes, that's what my patch (posted the link before) does:
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, + { RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
Anyway I agree with your patch. How about we drop RUN_STATE_DEBUG and drop RUN_STATE_GUEST_PANICKED from need reset list?
Yes, and also modify gdbstub.c. It's all in the URL I posted a few hours ago.
Paolo
OK, so can you pls post patches 1 and 2? I'll review and ack.
Next Monday I will. Paolo

On Thu, Oct 31, 2013 at 05:17:24PM +0100, Paolo Bonzini wrote:
Il 31/10/2013 17:14, Michael S. Tsirkin ha scritto:
PANICKED->DEBUG was added by commit bc7d0e667. That commit can be reverted if the panicked state is removed from runstate_needs_reset.
Okay so let's drop the code duplication and explicitly make them the same?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/vl.c b/vl.c index 46c29c4..e12d317 100644 --- a/vl.c +++ b/vl.c @@ -638,10 +638,6 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
- { RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, - { RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, - { RUN_STATE_MAX, RUN_STATE_MAX }, };
@@ -660,6 +656,12 @@ static void runstate_init(void)
for (p = &runstate_transitions_def[0]; p->from != RUN_STATE_MAX; p++) { runstate_valid_transitions[p->from][p->to] = true; + /* Panicked state is same as paused, we only made it different so + * management can detect a panic. + */ + if (p->from == RUN_STATE_PAUSED) { + runstate_valid_transitions[RUN_STATE_GUEST_PANICKED][p->to] = true;
It makes only sense to me if you do that for IO_ERROR and WATCHDOG as well,
Yea, let's do that.
and perhaps there are others I'm missing. Just add a comment before runstate_transitions_def's entries for PANICKED, IO_ERROR and WATCHDOG.
comments don't compile :)
But again, it is somewhat separate from the issue at hand, which is to finally make pvpanic usable and hopefully before 1.7.
Paolo
+ } } }
@@ -686,8 +688,7 @@ int runstate_is_running(void) bool runstate_needs_reset(void) { return runstate_check(RUN_STATE_INTERNAL_ERROR) || - runstate_check(RUN_STATE_SHUTDOWN) || - runstate_check(RUN_STATE_GUEST_PANICKED); + runstate_check(RUN_STATE_SHUTDOWN); }
StatusInfo *qmp_query_status(Error **errp)

On Thu, Oct 31, 2013 at 08:32:43AM -0600, Eric Blake wrote:
On 10/31/2013 08:30 AM, Michael S. Tsirkin wrote:
On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
Hi All,
I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this?
Thanks.
I think the biggest issue is the new PANICKED state. Guests already have simple ways to halt the CPU, and actually do. I think a new state was a mistake. So how about the following? Does it break anything? (Untested).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index 226e298..2055afc 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -51,7 +51,6 @@ static void handle_event(int event)
if (event & PVPANIC_PANICKED) { panicked_mon_event("pause"); - vm_stop(RUN_STATE_GUEST_PANICKED);
Don't you still need to halt the guest on a panic event, for management to have a chance to choose what to do about the panic?
Guest can just call hlt to do this. Most guests do this on a panic already.
I'm suspecting this patch does break things.
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 10/31/2013 08:47 AM, Michael S. Tsirkin wrote:
if (event & PVPANIC_PANICKED) { panicked_mon_event("pause"); - vm_stop(RUN_STATE_GUEST_PANICKED);
Don't you still need to halt the guest on a panic event, for management to have a chance to choose what to do about the panic?
Guest can just call hlt to do this. Most guests do this on a panic already.
On the one hand, the fact that the guest already has to inform the host means we are already trusting the guest behavior on a panic. On the other hand, assuming that the guest will ALWAYS halt after triggering a panic is putting a lot more trust in the guest, compared to qemu explicitly halting the guest so that management has a chance to choose to dump the guest's state at the moment the panic was flagged. The biggest argument for either removing all auto-pvpanic, or reverting pvpanic altogether, is that no one seems to be actively using pvpanic in the field yet. I wish we could get more feedback from Fujitsu as the original patch authors on what they are looking for in a working solution, rather than repeatedly second-guessing everything downstream and delaying the eradication of the buggy behavior even longer. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Oct 31, 2013 at 08:49:08AM -0600, Eric Blake wrote:
On 10/31/2013 08:47 AM, Michael S. Tsirkin wrote:
if (event & PVPANIC_PANICKED) { panicked_mon_event("pause"); - vm_stop(RUN_STATE_GUEST_PANICKED);
Don't you still need to halt the guest on a panic event, for management to have a chance to choose what to do about the panic?
Guest can just call hlt to do this. Most guests do this on a panic already.
On the one hand, the fact that the guest already has to inform the host means we are already trusting the guest behavior on a panic. On the other hand, assuming that the guest will ALWAYS halt after triggering a panic is putting a lot more trust in the guest, compared to qemu explicitly halting the guest so that management has a chance to choose to dump the guest's state at the moment the panic was flagged.
I wouldn't call it *a lot* more trust. And again, this is guest policy: if you want to do hlt from driver because you think it's safer, go for it.
The biggest argument for either removing all auto-pvpanic, or reverting pvpanic altogether, is that no one seems to be actively using pvpanic in the field yet. I wish we could get more feedback from Fujitsu as the original patch authors on what they are looking for in a working solution, rather than repeatedly second-guessing everything downstream and delaying the eradication of the buggy behavior even longer.
With my patch we have a benign device that merely reports io writes on the monitor. No code -> no bugs.
Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 31/10/13 15:30, Michael S. Tsirkin wrote:
On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
Hi All,
I know it's been a long time since this thread. But qemu 1.7 is releasing, do you have any consensus on this?
Thanks.
I think the biggest issue is the new PANICKED state.
I thought the problem was that the new device broke windows and all the following hazzle.
Guests already have simple ways to halt the CPU, and actually do. I think a new state was a mistake. So how about the following? Does it break anything? (Untested).
Please note that on s390 we also do the panic state (on a disabled wait) "target-s390x/kvm.c" ... monitor_protocol_event(QEVENT_GUEST_PANICKED, data); qobject_decref(data); vm_stop(RUN_STATE_GUEST_PANICKED); ... Currently it is possible to restart libvirt, e.g. after an update and then it will be able to fetch the full state of a guest via QMP. It will then also be able to detect that this guest panicked some time ago. I think one issue when removing the PANICKED state is that libvirt can then no longer detect that state, correct? Christian
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c index 226e298..2055afc 100644 --- a/hw/misc/pvpanic.c +++ b/hw/misc/pvpanic.c @@ -51,7 +51,6 @@ static void handle_event(int event)
if (event & PVPANIC_PANICKED) { panicked_mon_event("pause"); - vm_stop(RUN_STATE_GUEST_PANICKED); return; } }
participants (12)
-
Anthony Liguori
-
Christian Borntraeger
-
Daniel P. Berrange
-
Eric Blake
-
Hu Tao
-
Laszlo Ersek
-
Markus Armbruster
-
Michael S. Tsirkin
-
Paolo Bonzini
-
Peter Maydell
-
Richard W.M. Jones
-
Ronen Hod