On 20.06.2016 10:40, Nikolay Shirokovskiy wrote:
This patch is not critical but nice to have. The original
motivation
was error message in logs on undefining domain thru vz driver.
Undefine procedure drops domain lock while waiting for detaching
disks vz sdk call. Meanwhile vz sdk event domain-config-changed
arrives, its handler finds domain and is blocked waiting for job
condition. After undefine API call finishes event processing procedes
and tries to refreshes domain config thru existing vz sdk domain handle.
Domain does not exists anymore and event processing fails. Everything
is fine we just don't want to see error message in log for this
particular case.
Fortunately domain has flag that domain is removed from list. This
also imply that vz sdk domain is also undefined. Thus if we check
for this flag right after domain is locked again on accuiring
job condition we gracefully handle this situation.
Actually the race can happen in other situations too. Any
time we wait for job condition in mutualy exclusive job in
time when we acquire it vz sdk domain can cease to exist.
So instead of general internal error we can return domain
not found which is easier to handle. We don't need to patch
other places in mutually exclusive jobs where domain lock
is dropped as if job is started domain can't be undefine
by mutually exclusive undefine job.
In case of API calls that are not jobs (load snapshots etc) we'd
better to check if domain exists every time domain lock is
reacquired. Fortunately these calls drop domain lock only
once to call appropriate vz sdk API call.
The code of this patch is quite similar to qemu driver checks
for is domain is active after acquiring a job. The difference
only while qemu domain is operational while process is active
vz domain is operational while domain exists.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy(a)virtuozzo.com>
---
src/vz/vz_driver.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
src/vz/vz_sdk.c | 32 +++++++++++++++++++++++++++++---
2 files changed, 75 insertions(+), 3 deletions(-)
[snip]
Another case of hindsight is 20/20 )) I think the below changes are
not good and should be dropped from the patch. We can't rely on 'removing'
flag in case other then mutually exclusive jobs.
+static void
+vzDomainObjCheckExist(virDomainObjPtr dom)
+{
+ char uuidstr[VIR_UUID_STRING_BUFLEN];
+
+ if (!dom->removing)
+ return;
+
+ virUUIDFormat(dom->def->uuid, uuidstr);
+ virReportError(VIR_ERR_NO_DOMAIN,
+ _("no domain with matching uuid '%s' (%s)"),
+ uuidstr, dom->def->name);
+}
+
int
prlsdkDomainManagedSaveRemove(virDomainObjPtr dom)
{
@@ -3959,8 +3979,10 @@ prlsdkDomainManagedSaveRemove(virDomainObjPtr dom)
PRL_HANDLE job;
job = PrlVm_DropSuspendedState(privdom->sdkdom);
- if (PRL_FAILED(waitDomainJob(job, dom)))
+ if (PRL_FAILED(waitDomainJob(job, dom))) {
+ vzDomainObjCheckExist(dom);
return -1;
+ }
return 0;
}
@@ -4394,8 +4416,10 @@ prlsdkLoadSnapshots(virDomainObjPtr dom)
char *treexml = NULL;
job = PrlVm_GetSnapshotsTreeEx(privdom->sdkdom, PGST_WITHOUT_SCREENSHOTS);
- if (PRL_FAILED(getDomainJobResult(job, dom, &result)))
+ if (PRL_FAILED(getDomainJobResult(job, dom, &result))) {
+ vzDomainObjCheckExist(dom);
goto cleanup;
+ }
if (!(treexml = prlsdkGetStringParamVar(PrlResult_GetParamAsString, result)))
goto cleanup;
@@ -4427,8 +4451,10 @@ int prlsdkDeleteSnapshot(virDomainObjPtr dom, const char *uuid,
bool children)
PRL_HANDLE job;
job = PrlVm_DeleteSnapshot(privdom->sdkdom, uuid, children);
- if (PRL_FAILED(waitDomainJob(job, dom)))
+ if (PRL_FAILED(waitDomainJob(job, dom))) {
+ vzDomainObjCheckExist(dom);
return -1;
+ }
return 0;
}