Re: [libvirt] discuss about pvpanic

On 08/01/20 09:25, zhenwei pi wrote:
Hey, Paolo
Currently, pvpapic only supports bit 0(PVPANIC_PANICKED). We usually expect that guest writes ioport (typical 0x505) in panic_notifier_list callback during handling panic, then we can handle pvpapic event PVPANIC_PANICKED in QEMU.
On the other hand, guest wants to handle the crash by kdump-tools, and reboots without any panic_notifier_list callback. So QEMU only knows that guest has rebooted (because guest write 0xcf9 ioport for RCR request), but QEMU can't identify why guest resets.
In production environment, we hit about 100+ guest reboot event everyday, sadly we can't separate the abnormal reboot from normal operation.
We want to add a new bit for pvpanic event(maybe PVPANIC_CRASHLOADED) to represent the guest has crashed, and the panic is handled by the guest kernel. (here is the previous patch https://lkml.org/lkml/2019/12/14/265)
What do you think about this solution? Or do you have any other suggestions?
Hi Zhenwei, the kernel-side patch certainly makes sense. I assume that you want the event to propagate up from QEMU to Libvirt and so on? The QEMU patch would need to declare a new event (qapi/misc.json) and send it in handle_event (hw/misc/pvpanic.c). For Libvirt I'm not familiar, so I'm adding the respective list. Another possibility is to simply not write to pvpanic if kexec_crash_loaded() returns true; this would match what xen_panic_event does for example. The kexec kernel would then log the panic normally, without the need for MMIO at all. However, I have no problem with adding a new bit to the pvpanic I/O port so once you post the QEMU patch I will certainly ack the kernel side. Thanks, Paolo

On 1/8/20 10:36 AM, Paolo Bonzini wrote:
On 08/01/20 09:25, zhenwei pi wrote:
Hey, Paolo
Currently, pvpapic only supports bit 0(PVPANIC_PANICKED). We usually expect that guest writes ioport (typical 0x505) in panic_notifier_list callback during handling panic, then we can handle pvpapic event PVPANIC_PANICKED in QEMU.
On the other hand, guest wants to handle the crash by kdump-tools, and reboots without any panic_notifier_list callback. So QEMU only knows that guest has rebooted (because guest write 0xcf9 ioport for RCR request), but QEMU can't identify why guest resets.
In production environment, we hit about 100+ guest reboot event everyday, sadly we can't separate the abnormal reboot from normal operation.
We want to add a new bit for pvpanic event(maybe PVPANIC_CRASHLOADED) to represent the guest has crashed, and the panic is handled by the guest kernel. (here is the previous patch https://lkml.org/lkml/2019/12/14/265)
What do you think about this solution? Or do you have any other suggestions?
Hi Zhenwei,
the kernel-side patch certainly makes sense. I assume that you want the event to propagate up from QEMU to Libvirt and so on? The QEMU patch would need to declare a new event (qapi/misc.json) and send it in handle_event (hw/misc/pvpanic.c). For Libvirt I'm not familiar, so I'm adding the respective list.
Adding an event is fairly easy, if everything you want libvirt to do is report the event to upper layers. I volunteer to do it. Question is, how qemu is going to report this, whether some attributes to GUEST_PANICKED event or some new event. But more important is to merge the change into kernel. Michal

On 08/01/20 10:58, Michal Privoznik wrote:
the kernel-side patch certainly makes sense. I assume that you want the event to propagate up from QEMU to Libvirt and so on? The QEMU patch would need to declare a new event (qapi/misc.json) and send it in handle_event (hw/misc/pvpanic.c). For Libvirt I'm not familiar, so I'm adding the respective list.
Adding an event is fairly easy, if everything you want libvirt to do is report the event to upper layers. I volunteer to do it. Question is, how qemu is going to report this, whether some attributes to GUEST_PANICKED event or some new event.
I think it should be a new event, using GUEST_PANICKED could cause upper layers to react by shutting down or rebooting the guest. Thanks, Paolo

On 1/8/20 6:05 PM, Paolo Bonzini wrote:
On 08/01/20 10:58, Michal Privoznik wrote:
the kernel-side patch certainly makes sense. I assume that you want the event to propagate up from QEMU to Libvirt and so on? The QEMU patch would need to declare a new event (qapi/misc.json) and send it in handle_event (hw/misc/pvpanic.c). For Libvirt I'm not familiar, so I'm adding the respective list. Adding an event is fairly easy, if everything you want libvirt to do is report the event to upper layers. I volunteer to do it. Question is, how qemu is going to report this, whether some attributes to GUEST_PANICKED event or some new event. I think it should be a new event, using GUEST_PANICKED could cause upper layers to react by shutting down or rebooting the guest.
Thanks,
Paolo
In previous patch(https://lkml.org/lkml/2019/12/14/265), I defined a new bit (bit 1) PVPANIC_CRASH_LOADED for guest crash loaded event. And suggested by KH Greg, I moved the bit definition to an uapi header file. Then QEMU could include the header file from linux header and handle the new event. -- Thanks and Best Regards, zhenwei pi

On 08/01/20 11:33, zhenwei pi wrote:
In previous patch(https://lkml.org/lkml/2019/12/14/265), I defined a new bit (bit 1) PVPANIC_CRASH_LOADED for guest crash loaded event. And suggested by KH Greg, I moved the bit definition to an uapi header file. Then QEMU could include the header file from linux header and handle the new event.
Sure; QEMU has already got a mechanism to import files from Linux (scripts/update-linux-headers.sh). Paolo
participants (3)
-
Michal Privoznik
-
Paolo Bonzini
-
zhenwei pi