On 6/4/19 11:07 AM, Nikolay Shirokovskiy wrote:
On 30.05.2019 12:34, Michal Privoznik wrote:
> Changing domain definition must be guarded with acquiring modify
> job. The problem is if there is a thread executing say
> qemuDomainSetMemoryStatsPeriod() which is an API that acquires
> modify job and then possibly unlock the domain object and locks
> it again. However, becasue virDomainObjAssignDef() does not take
> a job (only object lock) it may have changed the domain
> definition while the other thread unlocked the domain object in
> order to talk on the monitor. For instance:
>
> Thread1:
> 1) lookup domain object and lock it
> 2) acquire job
> 3) get pointers to live and persistent defs
> 4) unlock the domain object
> 5) talk to qemu on the monitor
>
> Thread2 (Execute qemuDomainDefineXMLFlags):
> 1) lookup domain object and lock it
> 2) virDomainObjAssignDef() which overwrites persistent def and
> frees the old one
> 3) unlock domain object
>
> Thread1:
> 6) lock the domain object again
> 7) try to access the persistent def via pointer stored in 3)
>
> Now this will crash because the pointer from step 3) points to a
> memory that was freed.
>
> However, if Thread2 waited and acquired modify job (just like
> Thread1) then these two would serialize and no harm would be
> done.
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> src/qemu/qemu_domain.c | 3 ++-
> src/qemu/qemu_driver.c | 29 +++++++++++++++++++++++++----
> src/qemu/qemu_migration.c | 7 +++++++
> 3 files changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index c61d39b12b..fff2f1932e 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -8784,7 +8784,8 @@ qemuDomainRemoveInactive(virQEMUDriverPtr driver,
> return;
> }
>
> - qemuDomainRemoveInactiveCommon(driver, vm);
> + if (vm->def)
> + qemuDomainRemoveInactiveCommon(driver, vm);
>
> virDomainObjListRemove(driver->domains, vm);
Hmm, virDomainObjListRemoveLocked uses vm->def too.
> }
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 8bbac339e0..77cb7e4b87 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -1706,9 +1706,16 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr conn,
> VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE)))
> goto cleanup;
>
> + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) {
> + qemuDomainRemoveInactive(driver, vm);
> + goto cleanup;
> + }
> +
> virDomainObjAssignDef(vm, def, true, NULL);
> def = NULL;
>
> + qemuDomainObjEndJob(driver, vm);
> +
Looks like we can begin VIR_DOMAIN_JOB_OPERATION_START from the start instead
of acquiring job twice.
That is an async job. We shouldn't rely on async job when modifying a
domain.
> if (qemuProcessBeginJob(driver, vm, VIR_DOMAIN_JOB_OPERATION_START,
> flags) < 0) {
> qemuDomainRemoveInactiveJob(driver, vm);
> @@ -6976,6 +6983,11 @@ qemuDomainRestoreFlags(virConnectPtr conn,
> VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE)))
> goto cleanup;
>
> + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) {
> + qemuDomainRemoveInactive(driver, vm);
> + goto cleanup;
> + }
> +
> virDomainObjAssignDef(vm, def, true, NULL);
> def = NULL;
>
> @@ -6988,6 +7000,7 @@ qemuDomainRestoreFlags(virConnectPtr conn,
> priv = vm->privateData;
> priv->hookRun = true;
> }> + qemuDomainObjEndJob(driver, vm);
Same here
>
> if (qemuProcessBeginJob(driver, vm, VIR_DOMAIN_JOB_OPERATION_RESTORE,
> flags) < 0)
> @@ -7651,6 +7664,11 @@ qemuDomainDefineXMLFlags(virConnectPtr conn,
> driver->xmlopt, 0)))
> goto cleanup;
>
> + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) {
> + qemuDomainRemoveInactive(driver, vm);
> + goto cleanup;
> + }
> +
> virDomainObjAssignDef(vm, def, false, &oldDef);
> def = NULL;
>
> @@ -7673,7 +7691,7 @@ qemuDomainDefineXMLFlags(virConnectPtr conn,
> vm->persistent = 0;
> qemuDomainRemoveInactiveJob(driver, vm);
> }
> - goto cleanup;
> + goto endjob;
> }
>
> event = virDomainEventLifecycleNewFromObj(vm,
> @@ -7685,6 +7703,9 @@ qemuDomainDefineXMLFlags(virConnectPtr conn,
> VIR_INFO("Creating domain '%s'", vm->def->name);
> dom = virGetDomain(conn, vm->def->name, vm->def->uuid,
vm->def->id);
>
> + endjob:
> + qemuDomainObjEndJob(driver, vm);
> +
> cleanup:
> virDomainDefFree(oldDef);
> virDomainDefFree(def);
Additionally we need to replace qemuDomainRemoveInactiveJob with
qemuDomainRemoveInactive
as we have a job already.
I'm planning on wrapping what you suggested in your reply to the cover
letter into a single function. This way, the change won't be that
intrusive and a lot of these changes won't be made.
Michal