On Fri, Nov 25, 2016 at 8:23 PM, Michal Privoznik <mprivozn(a)redhat.com>
wrote:
On 25.11.2016 18:33, Roman Mohr wrote:
> On Fri, Nov 25, 2016 at 6:04 PM, Michal Privoznik <mprivozn(a)redhat.com>
> wrote:
>
>> On 25.11.2016 17:54, Roman Mohr wrote:
>>> On Fri, Nov 25, 2016 at 4:34 PM, Michal Privoznik <mprivozn(a)redhat.com
>
>>> wrote:
>>>
>>>> On 25.11.2016 14:38, Roman Mohr wrote:
>>>>> Hi,
>>>>>
>>>>> I recently started to use the libvirt domain events. With them I
>> increase
>>>>> the responsiveness of my VM state wachers.
>>>>> In general it works pretty well. I just listen to the events and do
a
>>>>> periodic resync to cope with missed events.
>>>>>
>>>>> While watching the events I ran into a few interesting situations I
>>>> wanted
>>>>> to share. The points 1-3 describe some minor issues or
irregularities.
>>>>> Point 4 is about the fact that domain and state updates are not
>> versioned
>>>>> which makes it very hard to stay in sync with libvirt when using
>> events.
>>>>>
>>>>> My libvirt version is 1.2.18.4.
>>>>
>>>> This might be the root cause. I'm unable to see some of the
scenarios
>>>> you're seeing. Have you tried the latest release (or even git HEAD)
to
>>>> check whether all the scenarios you are describing still stand?
>>>>
>>>
>>> Definitely better with latest HEAD but still it does not look
completely
>>> right.
>>>
>>>>
>>>>>
>>>>> 1) Event order seems to be weird on startup:
>>>>>
>>>>> When listening for VM lifecycle events I get this order:
>>>>>
>>>>> {"event_type": "Started", "timestamp":
"2016-11-25T11:59:53.209326Z",
>>>>> "reason": "Booted", "domain_name":
"generic", "domain_id":
>>>>> "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
>>>>> {"event_type": "Defined", "timestamp":
"2016-11-25T11:59:53.435530Z",
>>>>> "reason": "Added", "domain_name":
"generic", "domain_id":
>>>>> "8ff7047b-fb46-44ff-a4c6-7c20c73ab86e"}
>>>>>
>>>>> It is strange that a VM already boots before it is defined. Is this
the
>>>>> intended order?
>>>>
>>>> I don't see this order so probable this is fixed upstream.
>>>>
>>>
>>> On latest master a normal creation emits these events:
>>>
>>> event 'lifecycle' for domain testvm: Resumed Unpaused
>>> event 'lifecycle' for domain testvm: Started Booted
>>>
>>> The Resumed event looks wrong. Further I get no more Defined/Undefined
>>> events. Maybe they were removed?
>>
>> Yes, they were removed.
>
>
> Nice
>
>
>> The Resumed event comes from qemu actually,
>> because libvirt starts qemu in paused mode so that we can do some setup
>> (e.g. place vcpu threads into cgroups) and only after that we can resume
>> guest CPUs and in fact let guest start. Once this is done we
>> deliberately emit Started event.
>>
>
> I would expect an event like this:
>
> event 'lifecycle' for domain testvm: Suspended Bootstrapping
>
> before the other two events. That takes the ambiguity from the Resumed
> event.
>
>
>>>
>>>
>>>>
>>>>>
>>>>> 2) Defining a VM with VIR_DOMAIN_START_PAUSED gives me this event
order
>>>>
>>>> I don't think you can define a domain with that flag. What's
the
actual
>>>> action?
>>>>
>>>
>>> That is the flag for the api, when using virsh using `--paused` does
>> that.
>>
>> Ah, that's for virsh create/start not virsh define. Anyway, this is no
>> longer the case with upstream, is it?
>>
>>
> Right
>
>
>>>
>>>
>>>>
>>>>>
>>>>> {"event_type": "Defined", "timestamp":
"2016-11-25T12:02:44.037817Z",
>>>>> "reason": "Added", "domain_name":
"core_node", "domain_id":
>>>>> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
>>>>> {"event_type": "Resumed", "timestamp":
"2016-11-25T12:02:44.813104Z",
>>>>> "reason": "Unpaused", "domain_name":
"core_node", "domain_id":
>>>>> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
>>>>> {"event_type": "Started", "timestamp":
"2016-11-25T12:02:44.813733Z",
>>>>> "reason": "Booted", "domain_name":
"core_node", "domain_id":
>>>>> "b9906489-6d5b-40f8-a742-ca71b2b84277"}
>>>>
>>>>
>>>> Interesting, so here is "defined" event delivered before the
"started"
>>>> event. Also - where is "suspended" event?
>>>>
>>>
>>>
>>> With latest master the situation looks better. Now I see
>>>
>>> event 'lifecycle' for domain testvm: Started Booted
>>> event 'lifecycle' for domain testvm: Suspended Paused
>>
>> Again, both of these are deliberately emitted by libvirt and in fact I
>> think they reflect what is happening.
>>
>>
> Why is in this case not
>
> event 'lifecycle' for domain testvm: Resumed Unpaused
>
> event emitted?
>
> I would expect
>
> event 'lifecycle' for domain testvm: Resumed Unpaused
I fail to see why this is supposed to be emitted. The QEMU is started as
paused. If this even would be emitted, even if immediately followed by
Paused event, it might suggest that guest CPUs have run for a little
while. But that's not true.
So to sum this up, to see if I get it right.
When I directly start a VM I get:
1) event 'lifecycle' for domain testvm: Resumed Unpaused
2) event 'lifecycle' for domain testvm: Started Booted
When I want to start it in paused mode I get:
1) event 'lifecycle' for domain testvm: Started Booted
2) event 'lifecycle' for domain testvm: Suspended Paused
but no Resumed Unpaused as first event.
The "Resumed Unpaused" I get in the first case, is because of internal
libvirt VM setup on a paused VM, right?
This work is not happening before the Started Booted event when I start the
VM in paused mode?
> event 'lifecycle' for domain testvm: Started Booted
> event 'lifecycle' for domain testvm: Suspended Paused
>
>
> So the situation on master is much better but because of the
> Resumed/Unpaused event I still have the feeling that the most simple but
> powerful usecase, watching for CREATE, UPDATE, DELETE is very hard
because
> you can't know if the Resumed/Unpaused is the indicator for CREATE or
> UPDATE.
>
It's neither. Resumed event reflects vCPU state, not the domain XML
change. Maybe we need to document that better.
That would be great :)
For domain create event you need to watch for Started lifecycle event.
I have two questions here:
1) So the domain which triggers the resumed event does not exist before
this Started even although I get notifications assigned to it?
2) Isn't the resumed event still ambiguous then? Normally it implies that
the domain already exists and was paused, but in this case it reflects
internal libvirt work on a not yet created domain.
I don't think it is that much a deal, you can work around that, but it is
something everyone has to keep in mind when working with these events.
But even something simple like having a special event reason, let's say
'Resumed Initialized' would fix that, although the word 'Resumed' just
sounds wrong in this context.
For domain update event you need to watch for device-* event.
For domain shutdown event you need to watch for Stopped lifecycle event.
Michal
Roman