[libvirt] Lifecycle events during reboot for KVM and Xen

Hi, during debugging a problem[1] of Openstack Nova I recognized the following: Doing a reboot (from inside of the VM with "reboot" command) on a kvm VM doesn't send any lifecycle events (events debugged with [2]). Doing the same thing with a xen VM leads to 2 events: First a VIR_DOMAIN_EVENT_STOPPED and then a VIR_DOMAIN_EVENT_STARTED event. The problem here is that for the xen case it doesn't seem to be possible to recognize that a reboot is ongoing. For that reason the OpenStack Nova component just forces the domain to stop after receiving the VIR_DOMAIN_EVENT_STOPPED event. Is it expected that the 2 drivers send different events for the same action or a bug in qemu/xen/libvirt? Cheers, Tom [1] https://bugs.launchpad.net/nova/+bug/1293480 [2] https://gist.github.com/toabctl/53f26989ad7634a3168b

Thomas Bechtold wrote:
Hi,
during debugging a problem[1] of Openstack Nova I recognized the following:
Doing a reboot (from inside of the VM with "reboot" command) on a kvm VM doesn't send any lifecycle events (events debugged with [2]). Doing the same thing with a xen VM leads to 2 events: First a VIR_DOMAIN_EVENT_STOPPED and then a VIR_DOMAIN_EVENT_STARTED event.
Yep. Same can be said for new libxl Xen driver too.
The problem here is that for the xen case it doesn't seem to be possible to recognize that a reboot is ongoing.
Right. There is no VIR_DOMAIN_EVENT_REBOOTED event type.
For that reason the OpenStack Nova component just forces the domain to stop after receiving the VIR_DOMAIN_EVENT_STOPPED event.
Yikes!
Is it expected that the 2 drivers send different events for the same action or a bug in qemu/xen/libvirt?
You mentioned above that the qemu driver doesn't send *any* events when a reboot occurs within the VM. Looking at the code seems to confirm that. We could certainly change the Xen drivers to behave similarly, but I'd like to hear opinions from other libvirt devs. Options for resolving this include 1. Remove emitting the events from Xen drivers 2. Add the events to qemu driver and fix nova 3. Add VIR_DOMAIN_EVENT_REBOOTED, adapt drivers to use it, and fix nova Regards, Jim

On 08/08/2014 06:54 PM, Jim Fehlig wrote:
1. Remove emitting the events from Xen drivers 2. Add the events to qemu driver and fix nova
Just for the record: I already proposed a patch[1] for Nova to get some comments from Nova folks. The patch basically delays the Nova stop API call a couple of seconds. If a VIR_DOMAIN_EVENT_STARTED is received during the wait period, the stop API call is canceled and the VM can start. But I think solution 3) would be better.
3. Add VIR_DOMAIN_EVENT_REBOOTED, adapt drivers to use it, and fix nova
As mentioned, I think that would be nice to have. Best Tom [1] https://review.openstack.org/#/c/112946/

On Fri, Aug 08, 2014 at 10:54:53AM -0600, Jim Fehlig wrote:
Thomas Bechtold wrote:
Hi,
during debugging a problem[1] of Openstack Nova I recognized the following:
Doing a reboot (from inside of the VM with "reboot" command) on a kvm VM doesn't send any lifecycle events (events debugged with [2]). Doing the same thing with a xen VM leads to 2 events: First a VIR_DOMAIN_EVENT_STOPPED and then a VIR_DOMAIN_EVENT_STARTED event.
Yep. Same can be said for new libxl Xen driver too.
The problem here is that for the xen case it doesn't seem to be possible to recognize that a reboot is ongoing.
Right. There is no VIR_DOMAIN_EVENT_REBOOTED event type.
For that reason the OpenStack Nova component just forces the domain to stop after receiving the VIR_DOMAIN_EVENT_STOPPED event.
Yikes!
Is it expected that the 2 drivers send different events for the same action or a bug in qemu/xen/libvirt?
You mentioned above that the qemu driver doesn't send *any* events when a reboot occurs within the VM. Looking at the code seems to confirm that. We could certainly change the Xen drivers to behave similarly, but I'd like to hear opinions from other libvirt devs. Options for resolving this include
1. Remove emitting the events from Xen drivers
Possibily, though I'm not convinced this is actually possible.
2. Add the events to qemu driver and fix nova 3. Add VIR_DOMAIN_EVENT_REBOOTED, adapt drivers to use it, and fix nova
FYI, not having VIR_DOMAIN_EVENT_REBOOTED was an explicit decision since it is not a lifecycle transition. We do have a separate event though VIR_DOMAIN_EVENT_ID_REBOOOT to report on the case where the machine is has a reset. I think we probably need to make Nova robust to current behaviour. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Thu, Aug 07, 2014 at 08:18:54AM +0200, Thomas Bechtold wrote:
Hi,
during debugging a problem[1] of Openstack Nova I recognized the following:
Doing a reboot (from inside of the VM with "reboot" command) on a kvm VM doesn't send any lifecycle events (events debugged with [2]). Doing the same thing with a xen VM leads to 2 events: First a VIR_DOMAIN_EVENT_STOPPED and then a VIR_DOMAIN_EVENT_STARTED event. The problem here is that for the xen case it doesn't seem to be possible to recognize that a reboot is ongoing. For that reason the OpenStack Nova component just forces the domain to stop after receiving the VIR_DOMAIN_EVENT_STOPPED event.
Is it expected that the 2 drivers send different events for the same action or a bug in qemu/xen/libvirt?
The lifecycle events reflect changes in the state of the machine hardware. When a guest OS reboots in QEMU, the actual hardware does not do a running -> shutoff -> running transition. Hence it is not appropriate to emit a STOPPED/STARTED event in this case. I have a feeling that in Xen though, the actual Xen domain is torn down & created again, though perhaps that is actually different for PV vs HVM domains. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Jim Fehlig
-
Thomas Bechtold