On 04.06.2019 17:23, Michal Privoznik wrote:
On 6/4/19 11:07 AM, Nikolay Shirokovskiy wrote:
>
>
> On 30.05.2019 12:34, Michal Privoznik wrote:
>> Changing domain definition must be guarded with acquiring modify
>> job. The problem is if there is a thread executing say
>> qemuDomainSetMemoryStatsPeriod() which is an API that acquires
>> modify job and then possibly unlock the domain object and locks
>> it again. However, becasue virDomainObjAssignDef() does not take
>> a job (only object lock) it may have changed the domain
>> definition while the other thread unlocked the domain object in
>> order to talk on the monitor. For instance:
>>
>> Thread1:
>> 1) lookup domain object and lock it
>> 2) acquire job
>> 3) get pointers to live and persistent defs
>> 4) unlock the domain object
>> 5) talk to qemu on the monitor
>>
>> Thread2 (Execute qemuDomainDefineXMLFlags):
>> 1) lookup domain object and lock it
>> 2) virDomainObjAssignDef() which overwrites persistent def and
>> frees the old one
>> 3) unlock domain object
>>
>> Thread1:
>> 6) lock the domain object again
>> 7) try to access the persistent def via pointer stored in 3)
>>
>> Now this will crash because the pointer from step 3) points to a
>> memory that was freed.
>>
>> However, if Thread2 waited and acquired modify job (just like
>> Thread1) then these two would serialize and no harm would be
>> done.
>>
>> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
>> ---
>> src/qemu/qemu_domain.c | 3 ++-
>> src/qemu/qemu_driver.c | 29 +++++++++++++++++++++++++----
>> src/qemu/qemu_migration.c | 7 +++++++
>> 3 files changed, 34 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
>> index c61d39b12b..fff2f1932e 100644
>> --- a/src/qemu/qemu_domain.c
>> +++ b/src/qemu/qemu_domain.c
>> @@ -8784,7 +8784,8 @@ qemuDomainRemoveInactive(virQEMUDriverPtr driver,
>> return;
>> }
>> - qemuDomainRemoveInactiveCommon(driver, vm);
>> + if (vm->def)
>> + qemuDomainRemoveInactiveCommon(driver, vm);
>> virDomainObjListRemove(driver->domains, vm);
>
> Hmm, virDomainObjListRemoveLocked uses vm->def too.
>
>> }
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 8bbac339e0..77cb7e4b87 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -1706,9 +1706,16 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr
conn,
>> VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE)))
>> goto cleanup;
>> + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) {
>> + qemuDomainRemoveInactive(driver, vm);
>> + goto cleanup;
>> + }
>> +
>> virDomainObjAssignDef(vm, def, true, NULL);
>> def = NULL;
>> + qemuDomainObjEndJob(driver, vm);
>> +
>
> Looks like we can begin VIR_DOMAIN_JOB_OPERATION_START from the start instead
> of acquiring job twice.
That is an async job. We shouldn't rely on async job when modifying a domain.
To me it looks ok. We don't drop vm lock between begin job and assigning definition
so other threads have no chance to see invalid state.
Nikolay