On 04.05.2018 17:52, John Ferlan wrote:
On 05/03/2018 03:54 AM, Nikolay Shirokovskiy wrote:
>
>
> On 01.05.2018 01:03, John Ferlan wrote:
>>
>>
>> On 04/18/2018 10:44 AM, Nikolay Shirokovskiy wrote:
>>> Block job abort operation can not handle properly qemu crashes
>>> when waiting for abort/pivot completion. Deadlock scenario is next:
>>>
>>> - qemuDomainBlockJobAbort waits for pivot/abort completion
>>> - qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
>>> then waits for job condition (taken by qemuDomainBlockJobAbort)
>>> - qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
>>> active (vm->def->id != -1) so thread starts waiting for completion
again.
>>> Now two threads are in deadlock.
>>>
>>> First let's add some condition besides domain active status so that
waiting
>>> thread can check it when awakes. Second let's move signalling to the
place
>>> where condition is set, namely monitor eof/error handlers. Having signalling
>>> in qemuProcessBeginStopJob is not useful.
>>>
>>> The patch copies monitor error to domain state because at time
>>> waiting thread awakes there can be no monitor and it is useful to
>>> report monitor error to user.
>>>
>>> The patch has a drawback that on destroying a domain when waiting for
>>> event from monitor we get not very convinient error message like
>>
>> convenient
>>
>>> 'EOF from monitor' from waiting API. On the other hand if qemu
crashes
>>> this is more useful then 'domain is not running'. The first case
>>> will be addressed in another patch.
>>>
>>> The patch also fixes other places where we wait for event from qemu.
>>> Namely handling device removal and tray ejection. Other places which
>>> used virDomainObjWait (dump, migration, save) were good because of
>>> async jobs which allow concurrent destroy job.
>>>
>>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy(a)virtuozzo.com>
>>> ---
>>> src/conf/domain_conf.c | 43 -------------------------------------------
>>> src/conf/domain_conf.h | 3 ---
>>> src/libvirt_private.syms | 2 --
>>> src/qemu/qemu_domain.c | 45
+++++++++++++++++++++++++++++++++++++++++++++
>>> src/qemu/qemu_domain.h | 5 ++++-
>>> src/qemu/qemu_driver.c | 6 +++---
>>> src/qemu/qemu_hotplug.c | 4 ++--
>>> src/qemu/qemu_migration.c | 12 ++++++------
>>> src/qemu/qemu_process.c | 27 ++++++++++++++++++++++-----
>>> 9 files changed, 82 insertions(+), 65 deletions(-)
>>>
>>
>> This no longer git am -3 applies and based on previous patch reviews, I
>> think perhaps this needs to be redone.
>
> I'll resend as soon as we come to agreement on series. Now I'm not
> convinced to change much (except for using distinct flag to indicate error
> as mentioned in 2nd patch discussion, then I don't need 3d patch).
>
> See comments below.
>
>>
>> I don't believe moving/renaming virDomainObjWait is good/right, but I'm
>> sure there would be other opinions on that. Yes, QEMU is currently the
>> only consumer... If it were to move, it should move to virdomainobjlist
>> since that's where innards of virDomainObjPtr are managed. The fact that
>> we look at ->parent.lock, well, <sigh>...
>
> I would not move the function without reason. It gets new name qemuDomainObjWait
> becase it use qemu specific data, namely monError.
Today only qemu uses this generic virDomainObjWait which is generic
without the need to have/use qemu specific things. Other domain
consumers could use it if they chose.
I'm not in favor of moving, renaming, repurposing for a very specific
issue for what is a generically used function. Maybe that's just me - so
you could try to get a different reviewer if you want.
Sorry.
>
>>
>> Anyway, you're attempting to special case something and perhaps you just
>> need to create a qemuDomainObjWait that would call the timeout version
>> of the virDomainObjWait and be able to handle whether some other error
>> occurred. Wouldn't the LastError before the current wait return NULL
>> (as in no error), then during the timeout period if LastError is
>> something couldn't that indicate the failure you're looking for.
>
> I introduced qemuDomainObjWait not to put virDomainObjWait and virDomainObjWaitUntil
> in the first place but to check monError. Then rather then keeping too functions
> it's look more simple to use only one and branch on given timeout.
>
I would think if the goal was to have specific code for a specific
purpose for specific functions, then introduce the qemuDomainObjWait,
but have it call virDomainObjWait[Until] based on the need you have.
Which in this case appears to fiddle with monError in some way.
Again - that's just my opinion
John
Ok we can do that way. I just think that as virDomainObjWait/virDomainObjWaitUntil
will not be used anywhere in code after the patch then sooner or later somebody will
clean them up.
Nikolay
>>
>> I didn't spend a lot of time thinking about alternatives and how the
>> code should change, but when you mention the patch has a drawback - that
>> immediately raises concern.
>
> But ... I addressed this issue in next patch as I wrote.
>
[...]