On 25.11.2016 14:38, Roman Mohr wrote:
Hi,
I recently started to use the libvirt domain events. With them I increase
the responsiveness of my VM state wachers.
In general it works pretty well. I just listen to the events and do a
periodic resync to cope with missed events.
While watching the events I ran into a few interesting situations I wanted
to share. The points 1-3 describe some minor issues or irregularities.
Point 4 is about the fact that domain and state updates are not versioned
which makes it very hard to stay in sync with libvirt when using events.
My libvirt version is 1.2.18.4.
This might be the root cause. I'm unable to see some of the scenarios
you're seeing. Have you tried the latest release (or even git HEAD) to
check whether all the scenarios you are describing still stand?
1) Event order seems to be weird on startup:
When listening for VM lifecycle events I get this order:
{"event_type": "Started", "timestamp":
"2016-11-25T11:59:53.209326Z",
"reason": "Booted", "domain_name": "generic",
"domain_id":
"8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
{"event_type": "Defined", "timestamp":
"2016-11-25T11:59:53.435530Z",
"reason": "Added", "domain_name": "generic",
"domain_id":
"8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
It is strange that a VM already boots before it is defined. Is this the
intended order?
I don't see this order so probable this is fixed upstream.
2) Defining a VM with VIR_DOMAIN_START_PAUSED gives me this event order
I don't think you can define a domain with that flag. What's the actual
action?
{"event_type": "Defined", "timestamp":
"2016-11-25T12:02:44.037817Z",
"reason": "Added", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Resumed", "timestamp":
"2016-11-25T12:02:44.813104Z",
"reason": "Unpaused", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Started", "timestamp":
"2016-11-25T12:02:44.813733Z",
"reason": "Booted", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
Interesting, so here is "defined" event delivered before the
"started"
event. Also - where is "suspended" event?
This boot-order makes it hard to track active domains by listening to
life-cycle events. One could theoretically still always fetch the VM state
in the event callback and check the state, but if the state is not
immediately transferred with the event itself, it can already be outdated,
so this might be racy (intransparent for the libvirt bindings user), and as
described in (3) currently not even possible. In general the real existing
events seem to differ quite significantly from the described life-cycle in
[1].
Again, in the upstream I see something different:
event 'lifecycle' for domain $domain: Started Booted
event 'lifecycle' for domain $domain: Suspended Paused
>
> 3) "Defined" event is triggered before the domain is completely defined
{"event_type": "Defined", "timestamp":
"2016-11-25T12:02:44.037817Z",
"reason": "Added", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Resumed", "timestamp":
"2016-11-25T12:02:44.813104Z",
"reason": "Unpaused", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
{"event_type": "Started", "timestamp":
"2016-11-25T12:02:44.813733Z",
"reason": "Booted", "domain_name": "core_node",
"domain_id":
"b9906489-6d5b-40f8-a742-ca71b2b84277"}
>
> When I try to process the first event and do a xmldump I get:
>
> Event: [Code-42] [Domain-10] Domain not found: no domain with matching
> uuid 'b9906489-6d5b-40f8-a742-ca71b2b84277' (core_node)
>
> So it seems like I get the event before the domain is completely ready.
You know that you shouldn't be calling libvirt APIs from event callbacks?
4) There libvirt domain description is not versioned
I would expect that every time I update a domainxml (update from third
party entity), or an event is generated (update from libvirt), that the
resource version of a Domain is increased and that I get this resource
version when I do a xmldump or when I get an event. Without this there is
afaik no way to stay in sync with libvirt, even if you do regular polling
of all domains. The main issue here is that I can never know if events in
the queue arrived before my latest domain resync or after it.
Also not that this is not about delivery guarantees of events. It is just
about having a consistent view of a VM and the individual event. If I have
resource versions, I can decide if an event is still interesting for me or
not, which is exactly what I need to solve the syncing problem above.
When I do a complete relisting of all domains to syn, I know which version
I got and I can then see on every event if it is newer or older.
If along side with the event, the domain xml, the VM state, and the
resource version would be sent to a client, it would be even better. Then,
whenever there is a new event for a VM in the queue, I can be sure that
this domainxml I see is the one which triggered the event. This xml is then
a complete representation for this revision number.
I recall some people asking for this. Basically, they were worried about
somebody from outside could manipulate their XMLs without them knowing.
Frankly I don't recall what was our answer to that.
Having a version number in live XML makes sense. However, it makes less
sense for config XML - there would be no way how to start with version
#0 once I've edited the file.
Michal