
On 20.06.2016 10:40, Nikolay Shirokovskiy wrote:
This patch is not critical but nice to have. The original motivation was error message in logs on undefining domain thru vz driver. Undefine procedure drops domain lock while waiting for detaching disks vz sdk call. Meanwhile vz sdk event domain-config-changed arrives, its handler finds domain and is blocked waiting for job condition. After undefine API call finishes event processing procedes and tries to refreshes domain config thru existing vz sdk domain handle. Domain does not exists anymore and event processing fails. Everything is fine we just don't want to see error message in log for this particular case.
Fortunately domain has flag that domain is removed from list. This also imply that vz sdk domain is also undefined. Thus if we check for this flag right after domain is locked again on accuiring job condition we gracefully handle this situation.
Actually the race can happen in other situations too. Any time we wait for job condition in mutualy exclusive job in time when we acquire it vz sdk domain can cease to exist. So instead of general internal error we can return domain not found which is easier to handle. We don't need to patch other places in mutually exclusive jobs where domain lock is dropped as if job is started domain can't be undefine by mutually exclusive undefine job.
In case of API calls that are not jobs (load snapshots etc) we'd better to check if domain exists every time domain lock is reacquired. Fortunately these calls drop domain lock only once to call appropriate vz sdk API call.
The code of this patch is quite similar to qemu driver checks for is domain is active after acquiring a job. The difference only while qemu domain is operational while process is active vz domain is operational while domain exists.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/vz/vz_driver.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ src/vz/vz_sdk.c | 32 +++++++++++++++++++++++++++++--- 2 files changed, 75 insertions(+), 3 deletions(-)
[snip] Another case of hindsight is 20/20 )) I think the below changes are not good and should be dropped from the patch. We can't rely on 'removing' flag in case other then mutually exclusive jobs.
+static void +vzDomainObjCheckExist(virDomainObjPtr dom) +{ + char uuidstr[VIR_UUID_STRING_BUFLEN]; + + if (!dom->removing) + return; + + virUUIDFormat(dom->def->uuid, uuidstr); + virReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s' (%s)"), + uuidstr, dom->def->name); +} + int prlsdkDomainManagedSaveRemove(virDomainObjPtr dom) { @@ -3959,8 +3979,10 @@ prlsdkDomainManagedSaveRemove(virDomainObjPtr dom) PRL_HANDLE job;
job = PrlVm_DropSuspendedState(privdom->sdkdom); - if (PRL_FAILED(waitDomainJob(job, dom))) + if (PRL_FAILED(waitDomainJob(job, dom))) { + vzDomainObjCheckExist(dom); return -1; + }
return 0; } @@ -4394,8 +4416,10 @@ prlsdkLoadSnapshots(virDomainObjPtr dom) char *treexml = NULL;
job = PrlVm_GetSnapshotsTreeEx(privdom->sdkdom, PGST_WITHOUT_SCREENSHOTS); - if (PRL_FAILED(getDomainJobResult(job, dom, &result))) + if (PRL_FAILED(getDomainJobResult(job, dom, &result))) { + vzDomainObjCheckExist(dom); goto cleanup; + }
if (!(treexml = prlsdkGetStringParamVar(PrlResult_GetParamAsString, result))) goto cleanup; @@ -4427,8 +4451,10 @@ int prlsdkDeleteSnapshot(virDomainObjPtr dom, const char *uuid, bool children) PRL_HANDLE job;
job = PrlVm_DeleteSnapshot(privdom->sdkdom, uuid, children); - if (PRL_FAILED(waitDomainJob(job, dom))) + if (PRL_FAILED(waitDomainJob(job, dom))) { + vzDomainObjCheckExist(dom); return -1; + }
return 0; }