
On 04.06.2019 17:23, Michal Privoznik wrote:
On 6/4/19 11:07 AM, Nikolay Shirokovskiy wrote:
On 30.05.2019 12:34, Michal Privoznik wrote:
Changing domain definition must be guarded with acquiring modify job. The problem is if there is a thread executing say qemuDomainSetMemoryStatsPeriod() which is an API that acquires modify job and then possibly unlock the domain object and locks it again. However, becasue virDomainObjAssignDef() does not take a job (only object lock) it may have changed the domain definition while the other thread unlocked the domain object in order to talk on the monitor. For instance:
Thread1: 1) lookup domain object and lock it 2) acquire job 3) get pointers to live and persistent defs 4) unlock the domain object 5) talk to qemu on the monitor
Thread2 (Execute qemuDomainDefineXMLFlags): 1) lookup domain object and lock it 2) virDomainObjAssignDef() which overwrites persistent def and frees the old one 3) unlock domain object
Thread1: 6) lock the domain object again 7) try to access the persistent def via pointer stored in 3)
Now this will crash because the pointer from step 3) points to a memory that was freed.
However, if Thread2 waited and acquired modify job (just like Thread1) then these two would serialize and no harm would be done.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- src/qemu/qemu_driver.c | 29 +++++++++++++++++++++++++---- src/qemu/qemu_migration.c | 7 +++++++ 3 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index c61d39b12b..fff2f1932e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -8784,7 +8784,8 @@ qemuDomainRemoveInactive(virQEMUDriverPtr driver, return; } - qemuDomainRemoveInactiveCommon(driver, vm); + if (vm->def) + qemuDomainRemoveInactiveCommon(driver, vm); virDomainObjListRemove(driver->domains, vm);
Hmm, virDomainObjListRemoveLocked uses vm->def too.
} diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8bbac339e0..77cb7e4b87 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1706,9 +1706,16 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr conn, VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE))) goto cleanup; + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) { + qemuDomainRemoveInactive(driver, vm); + goto cleanup; + } + virDomainObjAssignDef(vm, def, true, NULL); def = NULL; + qemuDomainObjEndJob(driver, vm); +
Looks like we can begin VIR_DOMAIN_JOB_OPERATION_START from the start instead of acquiring job twice.
That is an async job. We shouldn't rely on async job when modifying a domain.
To me it looks ok. We don't drop vm lock between begin job and assigning definition so other threads have no chance to see invalid state. Nikolay