On 31.10.2014 15:47, Martin Kletzander wrote:
> When one domain is being undefined and at the same time started, for
> example, there is a possibility of a rare problem occuring.
>
> - Thread 1 does virDomainUndefine(), has the lock, checks that the
> domain is active and because it's not, calls
> virDomainObjListRemove().
>
> - Thread 2 does virDomainCreate() and tries to lock the domain.
>
> - Thread 1 needs to lock domain list in order to remove the domain from
> it, but must unlock domain first (proper order is to lock domain list
> first and the domain itself second).
>
> - Thread 2 grabs the lock, starts the domain and releases the lock.
>
> - Thread 1 grabs the lock and removes the domain from list.
>
> With this patch:
>
> - qemuDomainRemoveInactive() creates a QEMU_JOB_MODIFY if that's
> possible, but since it must remove the domain from list either way,
> it continues even when starting the job failed.
>
> - Because virDomainObjListRemove() removes a reference to the
> virDomainObj (and it must be kept that way for other drivers), we
> hack around that by acquiring one more reference that's kept until
> the job is ended.
>
> Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1150505
>
> Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
> ---
> src/qemu/qemu_domain.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 76fccce..98c4763 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -2392,9 +2392,13 @@ void
> qemuDomainRemoveInactive(virQEMUDriverPtr driver,
> virDomainObjPtr vm)
> {
> + bool haveJob = true;
> char *snapDir;
> virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
>
> + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0)
> + haveJob = false;
> +
> /* Remove any snapshot metadata prior to removing the domain */
> if (qemuDomainSnapshotDiscardAllMetadata(driver, vm) < 0) {
> VIR_WARN("unable to remove all snapshots for domain %s",
> @@ -2409,8 +2413,13 @@ qemuDomainRemoveInactive(virQEMUDriverPtr driver,
> VIR_WARN("unable to remove snapshot directory %s", snapDir);
> VIR_FREE(snapDir);
> }
> + virObjectRef(vm);
> virDomainObjListRemove(driver->domains, vm);
> virObjectUnref(cfg);
> +
> + if (haveJob)
> + ignore_value(qemuDomainObjEndJob(driver, vm));
> + virObjectUnref(vm);
> }
>
> void
>
I don't think there's a reason to do the Ref() and Unref(). Job control
functions do that already. ACK with that removed.
Oh, they do :) I removed those two lines and the part of commit
message as well. And I'll push it in a while.
Martin