Hi,

I recently started to use the libvirt domain events. With them I increase the responsiveness of my VM state wachers.
In general it works pretty well. I just listen to the events and do a periodic resync to cope with missed events.

While watching the events I ran into a few interesting situations I wanted to share. The points 1-3 describe some minor issues or irregularities. Point 4 is about the fact that domain and state updates are not versioned which makes it very hard to stay in sync with libvirt when using events.

My libvirt version is 1.2.18.4.

1) Event order seems to be weird on startup: 

When listening for VM lifecycle events I get this order:

{"event_type": "Started", "timestamp": "2016-11-25T11:59:53.209326Z", "reason": "Booted", "domain_name": "generic", "domain_id": "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
{"event_type": "Defined", "timestamp": "2016-11-25T11:59:53.435530Z", "reason": "Added", "domain_name": "generic", "domain_id": "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}

It is strange that a VM already boots before it is defined. Is this the intended order?

2) Defining a VM with VIR_DOMAIN_START_PAUSED gives me this event order

{"event_type": "Defined", "timestamp": "2016-11-25T12:02:44.037817Z", "reason": "Added", "domain_name": "core_node", "domain_id": "b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Resumed", "timestamp": "2016-11-25T12:02:44.813104Z", "reason": "Unpaused", "domain_name": "core_node", "domain_id": "b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Started", "timestamp": "2016-11-25T12:02:44.813733Z", "reason": "Booted", "domain_name": "core_node", "domain_id": "b9906489-6d5b-40f8-a742-ca71b2b84277"}

This boot-order makes it hard to track active domains by listening to life-cycle events. One could theoretically still always fetch the VM state in the event callback and check the state, but if the state is not immediately transferred with the event itself, it can already be outdated, so this might be racy (intransparent for the libvirt bindings user), and as described in (3) currently not even possible. In general the real existing events seem to differ quite significantly from the described life-cycle in [1].

3) "Defined" event is triggered before the domain is completely defined
 
{"event_type": "Defined", "timestamp": "2016-11-25T12:02:44.037817Z", "reason": "Added", "domain_name": "core_node", "domain_id": "b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Resumed", "timestamp": "2016-11-25T12:02:44.813104Z", "reason": "Unpaused", "domain_name": "core_node", "domain_id": "b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Started", "timestamp": "2016-11-25T12:02:44.813733Z", "reason": "Booted", "domain_name": "core_node", "domain_id": "b9906489-6d5b-40f8-a742-ca71b2b84277"}

When I try to process the first event and do a xmldump I get:

   Event: [Code-42] [Domain-10] Domain not found: no domain with matching uuid 'b9906489-6d5b-40f8-a742-ca71b2b84277' (core_node)

So it seems like I get the event before the domain is completely ready.

4) There libvirt domain description is not versioned

I would expect that every time I update a domainxml (update from third party entity), or an event is generated (update from libvirt), that the resource version of a Domain is increased and that I get this resource version when I do a xmldump or when I get an event. Without this there is afaik no way to stay in sync with libvirt, even if you do regular polling of all domains. The main issue here is that I can never know if events in the queue arrived before my latest domain resync or after it.

Also not that this is not about delivery guarantees of events. It is just about having a consistent view of a VM and the individual event. If I have resource versions, I can decide if an event is still interesting for me or not, which is exactly what I need to solve the syncing problem above. 
When I do a complete relisting of all domains to syn, I know which version I got and I can then see on every event if it is newer or older.

If along side with the event, the domain xml, the VM state, and the resource version would be sent to a client, it would be even better. Then, whenever there is a new event for a VM in the queue, I can be sure that this domainxml I see is the one which triggered the event. This xml is then a complete representation for this revision number.


Would be nice to hear your thoughts to these points.

Best Regards,
Roman

[1] https://wiki.libvirt.org/page/VM_lifecycle#States_that_a_guest_domain_can_be_in