[libvirt] latency between LIFECYCLE event and notification generation

nishant burte

11 May 2013 11 May '13

5:41 a.m.

Hi, I want to know following about LIFECYCLE events of libvirt. 1. about the the latency of these events happening and notification generation. e.g. suppose a VM goes down. How much time it takes to realize that the particular VM has gone down(going to say, DEFINED state) and then notification is generated? 2. Second question is, can someone please explain what are the sequence of steps happen between a VM going down and the notification is generated? Could you please answer both the queries? Thanks Nishant

Attachments:

attachment.html (text/html — 668 bytes)

Show replies by date

Eric Blake

13 May 13 May

7:44 a.m.

On 05/11/2013 07:41 AM, nishant burte wrote:

...

Hi,

I want to know following about LIFECYCLE events of libvirt.

1. about the the latency of these events happening and notification generation. e.g. suppose a VM goes down. How much time it takes to realize that the particular VM has gone down(going to say, DEFINED state) and then notification is generated?

Libvirt is not a real-time scheduler. We make no guarantees about when events will be delivered, and while it is likely that events are delivered in order, I'm not even brave enough to state that libvirt even guarantees in-order delivery to remote hosts. All I know is that libvirt tries to deliver events as soon as it knows about them, but that events are always best-effort, and you have to be prepared for guest state to have changed yet again in between when libvirt detected that an event should be delivered and when your code receives the event.

...

2. Second question is, can someone please explain what are the sequence of steps happen between a VM going down and the notification is generated?

How is the guest shutting down? Guest-initiated action, libvirt shutdown request, or libvirt destroy request? Are you interested in the specifics used by the qemu hypervisor, or in the lifecycle events in general without regards to which hypervisor? You may be best off trying to use the sample programs shipped as part of libvirt (see examples/domain-events/events-{c,python} in libvirt.git) to see what events are triggered in response to which actions. [And someday, I'd like to teach virsh to deal with events] -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Daniel P. Berrange

8:01 a.m.

On Mon, May 13, 2013 at 09:44:35AM -0600, Eric Blake wrote:

...

On 05/11/2013 07:41 AM, nishant burte wrote:

...
Hi,

I want to know following about LIFECYCLE events of libvirt.

1. about the the latency of these events happening and notification generation. e.g. suppose a VM goes down. How much time it takes to realize that the particular VM has gone down(going to say, DEFINED state) and then notification is generated?

Libvirt is not a real-time scheduler. We make no guarantees about when events will be delivered, and while it is likely that events are delivered in order, I'm not even brave enough to state that libvirt even guarantees in-order delivery to remote hosts. All I know is that libvirt tries to deliver events as soon as it knows about them, but that events are always best-effort, and you have to be prepared for guest state to have changed yet again in between when libvirt detected that an event should be delivered and when your code receives the event.

FYI, we *do* guarantee to deliver events to clients in exactly the same order that libvirtd detects the events, even to remote hosts, and we do not drop events. The RPC protocol is strictly serialized in its dispatch of events. We make no guarantees about latency though. There can be an arbitrary delay between libvirtd detecting the event & the client receiving it, though of course we aim to keep this latency as small as we can. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

nishant burte

10:26 p.m.

Thanks Eric and Daniel for the response. For the 2nd question, let me elaborate more.

...

2. Second question is, can someone please explain what are the sequence of steps happen between a VM going down and the notification is generated?

Lets say, the VM crashed. The question is, how does qemu (or hyperviser ) come to know about it? and how does it generate notification? e.g. I am looking for explanation something on following lines. VM pings hyperviser periodically, when it is UP. When these heartbeats stop, hyperviser detects VM has gone down and then it sends the notification to libvirt. Could you please give sequence of events on similar lines as given above? Thanks Nishant On Mon, May 13, 2013 at 9:31 PM, Daniel P. Berrange <berrange@redhat.com>wrote:

...

On Mon, May 13, 2013 at 09:44:35AM -0600, Eric Blake wrote:

...
On 05/11/2013 07:41 AM, nishant burte wrote:

...
Hi,

I want to know following about LIFECYCLE events of libvirt.

1. about the the latency of these events happening and notification generation. e.g. suppose a VM goes down. How much time it takes to realize that the particular VM has gone down(going to say, DEFINED state) and then notification is generated?

Libvirt is not a real-time scheduler. We make no guarantees about when events will be delivered, and while it is likely that events are delivered in order, I'm not even brave enough to state that libvirt even guarantees in-order delivery to remote hosts. All I know is that libvirt tries to deliver events as soon as it knows about them, but that events are always best-effort, and you have to be prepared for guest state to have changed yet again in between when libvirt detected that an event should be delivered and when your code receives the event.

FYI, we *do* guarantee to deliver events to clients in exactly the same order that libvirtd detects the events, even to remote hosts, and we do not drop events. The RPC protocol is strictly serialized in its dispatch of events.

We make no guarantees about latency though. There can be an arbitrary delay between libvirtd detecting the event & the client receiving it, though of course we aim to keep this latency as small as we can.

Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/:| |: http://libvirt.org -o- http://virt-manager.org:| |: http://autobuild.org -o- http://search.cpan.org/~danberr/:| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc:|

Daniel P. Berrange

14 May 14 May

1:42 a.m.

On Tue, May 14, 2013 at 11:56:48AM +0530, nishant burte wrote:

...

Thanks Eric and Daniel for the response. For the 2nd question, let me elaborate more.

...
2. Second question is, can someone please explain what are the sequence of steps happen between a VM going down and the notification is generated?

Lets say, the VM crashed. The question is, how does qemu (or hyperviser ) come to know about it? and how does it generate notification?

Libvirt has a connection to the QEMU monitor socket. So when QEMU dies, the monitor socket triggers POLLHUP or POLLERR in libvirt's event loop, at which point libvirt transitions the VM state to shutoff. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Eric Blake

2:48 p.m.

On 05/14/2013 12:26 AM, nishant burte wrote:

...

Thanks Eric and Daniel for the response. For the 2nd question, let me elaborate more.

...
2. Second question is, can someone please explain what are the sequence of steps happen between a VM going down and the notification is generated?

Lets say, the VM crashed. The question is, how does qemu (or hyperviser ) come to know about it? and how does it generate notification?

What do you mean, the VM crashed? It's an honest question - there are two levels of crashing: the qemu process that is running the VM crashed [host bug], or the guest itself went into a panic in some way observable by qemu [guest bug]. Right now, qemu can only report the first level of crashing (a qemu failure), and we HOPE those are rare. You can also wire up a watchdog device into your guest, where if the guest doesn't feed the watchdog often enough, then qemu can detect that, again as a first level approximation. There are patches that have been accepted for qemu 1.5, but also depend on using a new enough Linux kernel in the guest, that add a pvpanic device. With that device in place, if the guest detects a panic, then it can write a last-ditch effort message on the dedicated device to give second-level panic reporting. Libvirt still needs to be wired up to expose this second-level reporting in guest XML. Also, while the device could theoretically be used by any guest OS, right now I only know of new enough Linux kernels that know how to use it (that is, I don't know if anyone has written a Windows driver to be installed in a guest to tell qemu when Windows goes into BSOD).

...

e.g. I am looking for explanation something on following lines. VM pings hyperviser periodically, when it is UP. When these heartbeats stop, hyperviser detects VM has gone down and then it sends the notification to libvirt.

Could you please give sequence of events on similar lines as given above?

Other than a watchdog device or a dedicated pvpanic deivce, I don't know of any heartbeat at the qemu level. A guest should run the same as is does on bare metal, so how do you detect when a bare metal machine has gone down? If you can answer that (for example, if you you have a heartbeat at the IP level for deciding when to fence a bare metal guest), then you can also set up that same heartbeat to decide whether a guest is still up - but it would be at a higher level than what qemu/libvirt provide you. At the libvirt layer, we generally try to avoid relying on the guest any more than we have to (it's more secure if you assume the guest is malicious and therefore avoid making your behavior depend on actions by the guest). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

4677

Age (days ago)

4680

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Daniel P. Berrange
Eric Blake
nishant burte