On 05/03/2018 03:54 AM, Nikolay Shirokovskiy wrote:
On 01.05.2018 01:03, John Ferlan wrote:
>
>
> On 04/18/2018 10:44 AM, Nikolay Shirokovskiy wrote:
>> Block job abort operation can not handle properly qemu crashes
>> when waiting for abort/pivot completion. Deadlock scenario is next:
>>
>> - qemuDomainBlockJobAbort waits for pivot/abort completion
>> - qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and
>> then waits for job condition (taken by qemuDomainBlockJobAbort)
>> - qemuDomainBlockJobAbort awakes but nothing really changed, VM is still
>> active (vm->def->id != -1) so thread starts waiting for completion
again.
>> Now two threads are in deadlock.
>>
>> First let's add some condition besides domain active status so that waiting
>> thread can check it when awakes. Second let's move signalling to the place
>> where condition is set, namely monitor eof/error handlers. Having signalling
>> in qemuProcessBeginStopJob is not useful.
>>
>> The patch copies monitor error to domain state because at time
>> waiting thread awakes there can be no monitor and it is useful to
>> report monitor error to user.
>>
>> The patch has a drawback that on destroying a domain when waiting for
>> event from monitor we get not very convinient error message like
>
> convenient
>
>> 'EOF from monitor' from waiting API. On the other hand if qemu crashes
>> this is more useful then 'domain is not running'. The first case
>> will be addressed in another patch.
>>
>> The patch also fixes other places where we wait for event from qemu.
>> Namely handling device removal and tray ejection. Other places which
>> used virDomainObjWait (dump, migration, save) were good because of
>> async jobs which allow concurrent destroy job.
>>
>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy(a)virtuozzo.com>
>> ---
>> src/conf/domain_conf.c | 43 -------------------------------------------
>> src/conf/domain_conf.h | 3 ---
>> src/libvirt_private.syms | 2 --
>> src/qemu/qemu_domain.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>> src/qemu/qemu_domain.h | 5 ++++-
>> src/qemu/qemu_driver.c | 6 +++---
>> src/qemu/qemu_hotplug.c | 4 ++--
>> src/qemu/qemu_migration.c | 12 ++++++------
>> src/qemu/qemu_process.c | 27 ++++++++++++++++++++++-----
>> 9 files changed, 82 insertions(+), 65 deletions(-)
>>
>
> This no longer git am -3 applies and based on previous patch reviews, I
> think perhaps this needs to be redone.
I'll resend as soon as we come to agreement on series. Now I'm not
convinced to change much (except for using distinct flag to indicate error
as mentioned in 2nd patch discussion, then I don't need 3d patch).
See comments below.
>
> I don't believe moving/renaming virDomainObjWait is good/right, but I'm
> sure there would be other opinions on that. Yes, QEMU is currently the
> only consumer... If it were to move, it should move to virdomainobjlist
> since that's where innards of virDomainObjPtr are managed. The fact that
> we look at ->parent.lock, well, <sigh>...
I would not move the function without reason. It gets new name qemuDomainObjWait
becase it use qemu specific data, namely monError.
Today only qemu uses this generic virDomainObjWait which is generic
without the need to have/use qemu specific things. Other domain
consumers could use it if they chose.
I'm not in favor of moving, renaming, repurposing for a very specific
issue for what is a generically used function. Maybe that's just me - so
you could try to get a different reviewer if you want.
>
> Anyway, you're attempting to special case something and perhaps you just
> need to create a qemuDomainObjWait that would call the timeout version
> of the virDomainObjWait and be able to handle whether some other error
> occurred. Wouldn't the LastError before the current wait return NULL
> (as in no error), then during the timeout period if LastError is
> something couldn't that indicate the failure you're looking for.
I introduced qemuDomainObjWait not to put virDomainObjWait and virDomainObjWaitUntil
in the first place but to check monError. Then rather then keeping too functions
it's look more simple to use only one and branch on given timeout.
I would think if the goal was to have specific code for a specific
purpose for specific functions, then introduce the qemuDomainObjWait,
but have it call virDomainObjWait[Until] based on the need you have.
Which in this case appears to fiddle with monError in some way.
Again - that's just my opinion
John
>
> I didn't spend a lot of time thinking about alternatives and how the
> code should change, but when you mention the patch has a drawback - that
> immediately raises concern.
But ... I addressed this issue in next patch as I wrote.
[...]