On Fri, Nov 25, 2016 at 4:34 PM, Michal Privoznik <mprivozn(a)redhat.com>
wrote:
On 25.11.2016 14:38, Roman Mohr wrote:
> Hi,
>
> I recently started to use the libvirt domain events. With them I increase
> the responsiveness of my VM state wachers.
> In general it works pretty well. I just listen to the events and do a
> periodic resync to cope with missed events.
>
> While watching the events I ran into a few interesting situations I
wanted
> to share. The points 1-3 describe some minor issues or irregularities.
> Point 4 is about the fact that domain and state updates are not versioned
> which makes it very hard to stay in sync with libvirt when using events.
>
> My libvirt version is 1.2.18.4.
This might be the root cause. I'm unable to see some of the scenarios
you're seeing. Have you tried the latest release (or even git HEAD) to
check whether all the scenarios you are describing still stand?
Definitely better with latest HEAD but still it does not look completely
right.
>
> 1) Event order seems to be weird on startup:
>
> When listening for VM lifecycle events I get this order:
>
> {"event_type": "Started", "timestamp":
"2016-11-25T11:59:53.209326Z",
> "reason": "Booted", "domain_name":
"generic", "domain_id":
> "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
> {"event_type": "Defined", "timestamp":
"2016-11-25T11:59:53.435530Z",
> "reason": "Added", "domain_name": "generic",
"domain_id":
> "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
>
> It is strange that a VM already boots before it is defined. Is this the
> intended order?
I don't see this order so probable this is fixed upstream.
On latest master a normal creation emits these events:
event 'lifecycle' for domain testvm: Resumed Unpaused
event 'lifecycle' for domain testvm: Started Booted
The Resumed event looks wrong. Further I get no more Defined/Undefined
events. Maybe they were removed?
>
> 2) Defining a VM with VIR_DOMAIN_START_PAUSED gives me this event order
I don't think you can define a domain with that flag. What's the actual
action?
That is the flag for the api, when using virsh using `--paused` does that.
>
> {"event_type": "Defined", "timestamp":
"2016-11-25T12:02:44.037817Z",
> "reason": "Added", "domain_name":
"core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> {"event_type": "Resumed", "timestamp":
"2016-11-25T12:02:44.813104Z",
> "reason": "Unpaused", "domain_name":
"core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> {"event_type": "Started", "timestamp":
"2016-11-25T12:02:44.813733Z",
> "reason": "Booted", "domain_name":
"core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
Interesting, so here is "defined" event delivered before the
"started"
event. Also - where is "suspended" event?
With latest master the situation looks better. Now I see
event 'lifecycle' for domain testvm: Started Booted
event 'lifecycle' for domain testvm: Suspended Paused
>
> This boot-order makes it hard to track active domains by listening to
> life-cycle events. One could theoretically still always fetch the VM
state
> in the event callback and check the state, but if the state is not
> immediately transferred with the event itself, it can already be
outdated,
> so this might be racy (intransparent for the libvirt bindings user), and
as
> described in (3) currently not even possible. In general the real
existing
> events seem to differ quite significantly from the described life-cycle
in
> [1].
Again, in the upstream I see something different:
event 'lifecycle' for domain $domain: Started Booted
event 'lifecycle' for domain $domain: Suspended Paused
On master I see that too when I start the VM with `virsh create --paused`.
>
> 3) "Defined" event is triggered before the domain is completely defined
>
> {"event_type": "Defined", "timestamp":
"2016-11-25T12:02:44.037817Z",
> "reason": "Added", "domain_name":
"core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> {"event_type": "Resumed", "timestamp":
"2016-11-25T12:02:44.813104Z",
> "reason": "Unpaused", "domain_name":
"core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
> {"event_type": "Started", "timestamp":
"2016-11-25T12:02:44.813733Z",
> "reason": "Booted", "domain_name":
"core_node", "domain_id":
> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
>
> When I try to process the first event and do a xmldump I get:
>
> Event: [Code-42] [Domain-10] Domain not found: no domain with matching
> uuid 'b9906489-6d5b-40f8-a742-ca71b2b84277' (core_node)
>
> So it seems like I get the event before the domain is completely ready.
You know that you shouldn't be calling libvirt APIs from event callbacks?
No, good to know. Anyway, just tried to work around the above problems.
So if the Defined/Undefined events were removed deliberately, then only the
problem with the 'Resumed' event on a normal VM start remains.
>
> 4) There libvirt domain description is not versioned
>
> I would expect that every time I update a domainxml (update from third
> party entity), or an event is generated (update from libvirt), that the
> resource version of a Domain is increased and that I get this resource
> version when I do a xmldump or when I get an event. Without this there is
> afaik no way to stay in sync with libvirt, even if you do regular polling
> of all domains. The main issue here is that I can never know if events in
> the queue arrived before my latest domain resync or after it.
>
> Also not that this is not about delivery guarantees of events. It is just
> about having a consistent view of a VM and the individual event. If I
have
> resource versions, I can decide if an event is still interesting for me
or
> not, which is exactly what I need to solve the syncing problem above.
> When I do a complete relisting of all domains to syn, I know which
version
> I got and I can then see on every event if it is newer or older.
>
> If along side with the event, the domain xml, the VM state, and the
> resource version would be sent to a client, it would be even better.
Then,
> whenever there is a new event for a VM in the queue, I can be sure that
> this domainxml I see is the one which triggered the event. This xml is
then
> a complete representation for this revision number.
I recall some people asking for this. Basically, they were worried about
somebody from outside could manipulate their XMLs without them knowing.
Frankly I don't recall what was our answer to that.
Having a version number in live XML makes sense. However, it makes
less
sense for config XML - there would be no way how to start with version
#0 once I've edited the file.
Michal