Hi,
I recently started to use the libvirt domain events. With them I increase
the responsiveness of my VM state wachers.
In general it works pretty well. I just listen to the events and do a
periodic resync to cope with missed events.
While watching the events I ran into a few interesting situations I wanted
to share. The points 1-3 describe some minor issues or irregularities.
Point 4 is about the fact that domain and state updates are not versioned
which makes it very hard to stay in sync with libvirt when using events.
My libvirt version is 1.2.18.4.
1) Event order seems to be weird on startup:
When listening for VM lifecycle events I get this order:
{"event_type": "Started", "timestamp":
"2016-11-25T11:59:53.209326Z",
"reason": "Booted", "domain_name": "generic",
"domain_id":
"8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
{"event_type": "Defined", "timestamp":
"2016-11-25T11:59:53.435530Z",
"reason": "Added", "domain_name": "generic",
"domain_id":
"8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
It is strange that a VM already boots before it is defined. Is this the
intended order?
2) Defining a VM with VIR_DOMAIN_START_PAUSED gives me this event order
{"event_type": "Defined", "timestamp":
"2016-11-25T12:02:44.037817Z",
"reason": "Added", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Resumed", "timestamp":
"2016-11-25T12:02:44.813104Z",
"reason": "Unpaused", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Started", "timestamp":
"2016-11-25T12:02:44.813733Z",
"reason": "Booted", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
This boot-order makes it hard to track active domains by listening to
life-cycle events. One could theoretically still always fetch the VM state
in the event callback and check the state, but if the state is not
immediately transferred with the event itself, it can already be outdated,
so this might be racy (intransparent for the libvirt bindings user), and as
described in (3) currently not even possible. In general the real existing
events seem to differ quite significantly from the described life-cycle in
[1].
3) "Defined" event is triggered before the domain is completely defined
{"event_type": "Defined", "timestamp":
"2016-11-25T12:02:44.037817Z",
"reason": "Added", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Resumed", "timestamp":
"2016-11-25T12:02:44.813104Z",
"reason": "Unpaused", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Started", "timestamp":
"2016-11-25T12:02:44.813733Z",
"reason": "Booted", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
When I try to process the first event and do a xmldump I get:
Event: [Code-42] [Domain-10] Domain not found: no domain with matching
uuid 'b9906489-6d5b-40f8-a742-ca71b2b84277' (core_node)
So it seems like I get the event before the domain is completely ready.
4) There libvirt domain description is not versioned
I would expect that every time I update a domainxml (update from third
party entity), or an event is generated (update from libvirt), that the
resource version of a Domain is increased and that I get this resource
version when I do a xmldump or when I get an event. Without this there is
afaik no way to stay in sync with libvirt, even if you do regular polling
of all domains. The main issue here is that I can never know if events in
the queue arrived before my latest domain resync or after it.
Also not that this is not about delivery guarantees of events. It is just
about having a consistent view of a VM and the individual event. If I have
resource versions, I can decide if an event is still interesting for me or
not, which is exactly what I need to solve the syncing problem above.
When I do a complete relisting of all domains to syn, I know which version
I got and I can then see on every event if it is newer or older.
If along side with the event, the domain xml, the VM state, and the
resource version would be sent to a client, it would be even better. Then,
whenever there is a new event for a VM in the queue, I can be sure that
this domainxml I see is the one which triggered the event. This xml is then
a complete representation for this revision number.
Would be nice to hear your thoughts to these points.
Best Regards,
Roman
[1]
https://wiki.libvirt.org/page/VM_lifecycle#States_that_a_guest_domain_can...