[libvirt] [PATCH] qemu: undefine is not allowed during domain starting up

Start a domain whilst undefine it, if starting failed duing ProcessLaunch, on which period qemu exited unexpectedly, the operation will lead to failure of undefine the domain until libvirtd restarted. The reason is that libvirtd will unlock vm during qemuProcessStart, qemuDomainUndefineFlags can get the lock and set vm->persistent 0 but not remove the "active" domain. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> --- src/conf/domain_conf.h | 1 + src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_process.c | 3 +++ 3 files changed, 10 insertions(+) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index af15ee8..f339f84 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2468,6 +2468,7 @@ struct _virDomainObj { virDomainSnapshotObjPtr current_snapshot; bool hasManagedSave; + bool starting; void *privateData; void (*privateDataFreeFunc)(void *); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6568def..5d9acfc 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7317,6 +7317,12 @@ qemuDomainUndefineFlags(virDomainPtr dom, if (!(vm = qemuDomObjFromDomain(dom))) return -1; + if (vm->starting) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("cannot undefine during domain starting up")); + goto cleanup; + } + cfg = virQEMUDriverGetConfig(driver); if (virDomainUndefineFlagsEnsureACL(dom->conn, vm->def) < 0) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 525521a..7b708be 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5847,6 +5847,8 @@ qemuProcessStart(virConnectPtr conn, if (!migrateFrom && !snapshot) flags |= VIR_QEMU_PROCESS_START_NEW; + vm->starting = true; + if (qemuProcessInit(driver, vm, updatedCPU, asyncJob, !!migrateFrom, flags) < 0) goto cleanup; @@ -5892,6 +5894,7 @@ qemuProcessStart(virConnectPtr conn, ret = 0; cleanup: + vm->starting = false; qemuProcessIncomingDefFree(incoming); return ret; -- 1.8.3.1

On Sat, Jul 22, 2017 at 04:55:49 -0400, Yi Wang wrote:
Start a domain whilst undefine it, if starting failed duing ProcessLaunch, on which period qemu exited unexpectedly, the operation will lead to failure of undefine the domain until libvirtd restarted. The reason is that libvirtd will unlock vm during qemuProcessStart, qemuDomainUndefineFlags can get the lock and set vm->persistent 0 but not remove the "active" domain.
Shouldn't the startup code handle that? It definitely works when starting a transient domain, so making it transient while the startup code is executed should be the same case. Since we copy the definition prior to startup, there really should not be any problem in making the VM transient while it's being started.

On 07/24/2017 04:06 AM, Peter Krempa wrote:
On Sat, Jul 22, 2017 at 04:55:49 -0400, Yi Wang wrote:
Start a domain whilst undefine it, if starting failed duing ProcessLaunch, on which period qemu exited unexpectedly, the operation will lead to failure of undefine the domain until libvirtd restarted. The reason is that libvirtd will unlock vm during qemuProcessStart, qemuDomainUndefineFlags can get the lock and set vm->persistent 0 but not remove the "active" domain.
Shouldn't the startup code handle that? It definitely works when starting a transient domain, so making it transient while the startup code is executed should be the same case.
Since we copy the definition prior to startup, there really should not be any problem in making the VM transient while it's being started.
FWIW: This patch started as: https://www.redhat.com/archives/libvir-list/2017-June/msg01081.html I reviewed earlier this month: https://www.redhat.com/archives/libvir-list/2017-July/msg00278.html but responses to the initial review and a couple followups by the submitter got unthreaded... A trail of a few breadcrumbs: https://www.redhat.com/archives/libvir-list/2017-July/msg00387.html https://www.redhat.com/archives/libvir-list/2017-July/msg00762.html https://www.redhat.com/archives/libvir-list/2017-July/msg00864.html In any case, the crux of the issue is that during startup the domain obj lock is released right around the time we go to start the monitor (and the vmagent) - an extra reference is taken to avoid obj deletion. In another thread there's an Undefine run. That thread gets the lock, finds a persistent domain and then clears that persistent bit at the end, calls RemoveInactive. By that time, the domain startup thread gets the obj lock back. The Undefine running caused some issues, causing the start to go into it's failure path, but that fails to remove the domain. The command used to start/undefine buried in one response is virsh start win7 &; sleep 0.2; virsh undefine win7 John

SGkgUGV0ZXIsDQoNClRoYW5rcyBmb3IgeW91ciByZXBseS4NCg0KPk9uIFNhdCwgSnVsIDIyLCAy MDE3IGF0IDA0OjU1OjQ5IC0wNDAwLCBZaSBXYW5nIHdyb3RlOg0KDQo+PiBTdGFydCBhIGRvbWFp biB3aGlsc3QgdW5kZWZpbmUgaXQsIGlmIHN0YXJ0aW5nIGZhaWxlZCBkdWluZyBQcm9jZXNzTGF1 bmNoLA0KDQo+PiBvbiB3aGljaCBwZXJpb2QgcWVtdSBleGl0ZWQgdW5leHBlY3RlZGx5LCB0aGUg b3BlcmF0aW9uIHdpbGwgbGVhZCB0byBmYWlsdXJlDQoNCj4+IG9mIHVuZGVmaW5lIHRoZSBkb21h aW4gdW50aWwgbGlidmlydGQgcmVzdGFydGVkLiBUaGUgcmVhc29uIGlzIHRoYXQgbGlidmlydGQN Cg0KPj4gd2lsbCB1bmxvY2sgdm0gZHVyaW5nIHFlbXVQcm9jZXNzU3RhcnQsIHFlbXVEb21haW5V bmRlZmluZUZsYWdzIGNhbiBnZXQgdGhlDQoNCj4+IGxvY2sgYW5kIHNldCB2bS0+cGVyc2lzdGVu dCAwIGJ1dCBub3QgcmVtb3ZlIHRoZSAiYWN0aXZlIiBkb21haW4uDQoNCj4NCg0KPlNob3VsZG4n dCB0aGUgc3RhcnR1cCBjb2RlIGhhbmRsZSB0aGF0PyBJdCBkZWZpbml0ZWx5IHdvcmtzIHdoZW4N Cg0KPnN0YXJ0aW5nIGEgdHJhbnNpZW50IGRvbWFpbiwgc28gbWFraW5nIGl0IHRyYW5zaWVudCB3 aGlsZSB0aGUgc3RhcnR1cA0KDQo+Y29kZSBpcyBleGVjdXRlZCBzaG91bGQgYmUgdGhlIHNhbWUg Y2FzZS4NCg0KPg0KDQo+U2luY2Ugd2UgY29weSB0aGUgZGVmaW5pdGlvbiBwcmlvciB0byBzdGFy dHVwLCB0aGVyZSByZWFsbHkgc2hvdWxkIG5vdA0KDQo+YmUgYW55IHByb2JsZW0gaW4gbWFraW5n IHRoZSBWTSB0cmFuc2llbnQgd2hpbGUgaXQncyBiZWluZyBzdGFydGVkLg0KDQoNCg0KDQpZZWFo LCBpdCBpcyBhbGxvd2VkIHRvIG1ha2UgVk0gdHJhbnNpZW50IGR1cmluZyBiZWluZyBzdGFydGVk LCBidXQgdGhlIHByb2JsZW0NCg0KaXMgdGhhdCBpZiB0aGUgc2l0dWF0aW9uIEkgZGVzY3JpYmVk IGFib3ZlIG9jY3Vycywgd2UgY2FuJ3QgdW5kZWZpbmUgdGhhdCBWTSB1bnRpbA0KDQpsaWJ2aXJ0 ZCByZXN0YXJ0ZWQuDQoNCk1vcmUgZGV0YWlscyBoYXZlIGJlZW4gaW50cm9kdWNlZCBpbiBhbm90 aGVyIHRocmVhZCBbMV0sIHBsZWFzZSByZXZpZXcgdGhhdCBwaWVjZSA6KQ0KDQpBcyBkaXNjdXNz ZWQgd2l0aCBKb2huIGluIHRoYXQgdGhyZWFkLCBJIHN1cHBvc2UgYSB3aG9sZSBuZXcgcGF0Y2gg aW5zdGVhZCBvZiBhDQoNClYyIHZlcnNpb24gaXMgYmV0dGVyLg0KDQpUaGFua3MgYWdhaW4gOikN Cg0KDQoNCg0KDQoNCg0KWzFdIGh0dHBzOi8vd3d3LnJlZGhhdC5jb20vYXJjaGl2ZXMvbGlidmly LWxpc3QvMjAxNy1KdWx5L21zZzAwNzYyLmh0bWwNCg0KDQoNCg0KDQoNCg0KDQotLS0NCg0KQmVz dCB3aXNoZXMNCg0KWWkgV2FuZw==

On 07/22/2017 04:55 AM, Yi Wang wrote:
Start a domain whilst undefine it, if starting failed duing ProcessLaunch, on which period qemu exited unexpectedly, the operation will lead to failure of undefine the domain until libvirtd restarted. The reason is that libvirtd will unlock vm during qemuProcessStart, qemuDomainUndefineFlags can get the lock and set vm->persistent 0 but not remove the "active" domain.
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> --- src/conf/domain_conf.h | 1 + src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_process.c | 3 +++ 3 files changed, 10 insertions(+)
Can you apply a couple of recent patches, see: https://www.redhat.com/archives/libvir-list/2017-August/msg00389.html and see if those would resolve what you're seeing... without these changes of course... Tks - John
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index af15ee8..f339f84 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2468,6 +2468,7 @@ struct _virDomainObj { virDomainSnapshotObjPtr current_snapshot;
bool hasManagedSave; + bool starting;
void *privateData; void (*privateDataFreeFunc)(void *); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6568def..5d9acfc 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -7317,6 +7317,12 @@ qemuDomainUndefineFlags(virDomainPtr dom, if (!(vm = qemuDomObjFromDomain(dom))) return -1;
+ if (vm->starting) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("cannot undefine during domain starting up")); + goto cleanup; + } + cfg = virQEMUDriverGetConfig(driver);
if (virDomainUndefineFlagsEnsureACL(dom->conn, vm->def) < 0) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 525521a..7b708be 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5847,6 +5847,8 @@ qemuProcessStart(virConnectPtr conn, if (!migrateFrom && !snapshot) flags |= VIR_QEMU_PROCESS_START_NEW;
+ vm->starting = true; + if (qemuProcessInit(driver, vm, updatedCPU, asyncJob, !!migrateFrom, flags) < 0) goto cleanup; @@ -5892,6 +5894,7 @@ qemuProcessStart(virConnectPtr conn, ret = 0;
cleanup: + vm->starting = false; qemuProcessIncomingDefFree(incoming); return ret;

Pk9uIDA3LzIyLzIwMTcgMDQ6NTUgQU0sIFlpIFdhbmcgd3JvdGU6DQoNCj4+IFN0YXJ0IGEgZG9t YWluIHdoaWxzdCB1bmRlZmluZSBpdCwgaWYgc3RhcnRpbmcgZmFpbGVkIGR1aW5nIFByb2Nlc3NM YXVuY2gsDQoNCj4+IG9uIHdoaWNoIHBlcmlvZCBxZW11IGV4aXRlZCB1bmV4cGVjdGVkbHksIHRo ZSBvcGVyYXRpb24gd2lsbCBsZWFkIHRvIGZhaWx1cmUNCg0KPj4gb2YgdW5kZWZpbmUgdGhlIGRv bWFpbiB1bnRpbCBsaWJ2aXJ0ZCByZXN0YXJ0ZWQuIFRoZSByZWFzb24gaXMgdGhhdCBsaWJ2aXJ0 ZA0KDQo+PiB3aWxsIHVubG9jayB2bSBkdXJpbmcgcWVtdVByb2Nlc3NTdGFydCwgcWVtdURvbWFp blVuZGVmaW5lRmxhZ3MgY2FuIGdldCB0aGUNCg0KPj4gbG9jayBhbmQgc2V0IHZtLT5wZXJzaXN0 ZW50IDAgYnV0IG5vdCByZW1vdmUgdGhlICJhY3RpdmUiIGRvbWFpbi4NCg0KPj4gDQoNCj4+IFNp Z25lZC1vZmYtYnk6IFlpIFdhbmcgPHdhbmcueWk1OUB6dGUuY29tLmNuPg0KDQo+PiAtLS0NCg0K Pj4gIHNyYy9jb25mL2RvbWFpbl9jb25mLmggIHwgMSArDQoNCj4+ICBzcmMvcWVtdS9xZW11X2Ry aXZlci5jICB8IDYgKysrKysrDQoNCj4+ICBzcmMvcWVtdS9xZW11X3Byb2Nlc3MuYyB8IDMgKysr DQoNCj4+ICAzIGZpbGVzIGNoYW5nZWQsIDEwIGluc2VydGlvbnMoKykNCg0KPj4gDQoNCj4NCg0K PkNhbiB5b3UgYXBwbHkgYSBjb3VwbGUgb2YgcmVjZW50IHBhdGNoZXMsIHNlZToNCg0KPg0KDQo+ aHR0cHM6Ly93d3cucmVkaGF0LmNvbS9hcmNoaXZlcy9saWJ2aXItbGlzdC8yMDE3LUF1Z3VzdC9t c2cwMDM4OS5odG1sDQoNCj4NCg0KPmFuZCBzZWUgaWYgdGhvc2Ugd291bGQgcmVzb2x2ZSB3aGF0 IHlvdSdyZSBzZWVpbmcuLi4gd2l0aG91dCB0aGVzZQ0KDQo+Y2hhbmdlcyBvZiBjb3Vyc2UuLi4N Cg0KDQoNCg0KWWVhaCwgSXQgd29ya3MuIFRoYW5rcyBmb3IgdGhlIHBhdGNoLCB3aGljaCBpcyB0 aGUgY29ycmVjdCBzb2x1dGlvbiBmb3IgbXkgcHJvYmxlbS4NCg0KDQotLS0NCg0KQmVzdCB3aXNo ZXMNCg0KWWkgV2FuZw==
participants (4)
-
John Ferlan
-
Peter Krempa
-
wang.yi59@zte.com.cn
-
Yi Wang