
On 05/03/2018 03:54 AM, Nikolay Shirokovskiy wrote:
On 01.05.2018 01:03, John Ferlan wrote:
On 04/18/2018 10:44 AM, Nikolay Shirokovskiy wrote:
Block job abort operation can not handle properly qemu crashes when waiting for abort/pivot completion. Deadlock scenario is next:
- qemuDomainBlockJobAbort waits for pivot/abort completion - qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and then waits for job condition (taken by qemuDomainBlockJobAbort) - qemuDomainBlockJobAbort awakes but nothing really changed, VM is still active (vm->def->id != -1) so thread starts waiting for completion again. Now two threads are in deadlock.
First let's add some condition besides domain active status so that waiting thread can check it when awakes. Second let's move signalling to the place where condition is set, namely monitor eof/error handlers. Having signalling in qemuProcessBeginStopJob is not useful.
The patch copies monitor error to domain state because at time waiting thread awakes there can be no monitor and it is useful to report monitor error to user.
The patch has a drawback that on destroying a domain when waiting for event from monitor we get not very convinient error message like
convenient
'EOF from monitor' from waiting API. On the other hand if qemu crashes this is more useful then 'domain is not running'. The first case will be addressed in another patch.
The patch also fixes other places where we wait for event from qemu. Namely handling device removal and tray ejection. Other places which used virDomainObjWait (dump, migration, save) were good because of async jobs which allow concurrent destroy job.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/conf/domain_conf.c | 43 ------------------------------------------- src/conf/domain_conf.h | 3 --- src/libvirt_private.syms | 2 -- src/qemu/qemu_domain.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 5 ++++- src/qemu/qemu_driver.c | 6 +++--- src/qemu/qemu_hotplug.c | 4 ++-- src/qemu/qemu_migration.c | 12 ++++++------ src/qemu/qemu_process.c | 27 ++++++++++++++++++++++----- 9 files changed, 82 insertions(+), 65 deletions(-)
This no longer git am -3 applies and based on previous patch reviews, I think perhaps this needs to be redone.
I'll resend as soon as we come to agreement on series. Now I'm not convinced to change much (except for using distinct flag to indicate error as mentioned in 2nd patch discussion, then I don't need 3d patch).
See comments below.
I don't believe moving/renaming virDomainObjWait is good/right, but I'm sure there would be other opinions on that. Yes, QEMU is currently the only consumer... If it were to move, it should move to virdomainobjlist since that's where innards of virDomainObjPtr are managed. The fact that we look at ->parent.lock, well, <sigh>...
I would not move the function without reason. It gets new name qemuDomainObjWait becase it use qemu specific data, namely monError.
Today only qemu uses this generic virDomainObjWait which is generic without the need to have/use qemu specific things. Other domain consumers could use it if they chose. I'm not in favor of moving, renaming, repurposing for a very specific issue for what is a generically used function. Maybe that's just me - so you could try to get a different reviewer if you want.
Anyway, you're attempting to special case something and perhaps you just need to create a qemuDomainObjWait that would call the timeout version of the virDomainObjWait and be able to handle whether some other error occurred. Wouldn't the LastError before the current wait return NULL (as in no error), then during the timeout period if LastError is something couldn't that indicate the failure you're looking for.
I introduced qemuDomainObjWait not to put virDomainObjWait and virDomainObjWaitUntil in the first place but to check monError. Then rather then keeping too functions it's look more simple to use only one and branch on given timeout.
I would think if the goal was to have specific code for a specific purpose for specific functions, then introduce the qemuDomainObjWait, but have it call virDomainObjWait[Until] based on the need you have. Which in this case appears to fiddle with monError in some way. Again - that's just my opinion John
I didn't spend a lot of time thinking about alternatives and how the code should change, but when you mention the patch has a drawback - that immediately raises concern.
But ... I addressed this issue in next patch as I wrote.
[...]