[libvirt] libxl: Issues with virt-manager when used to manager Xen domains

This started off with some regression testing after going forward to Xen-4.4. We currently would pair that with a libvirt version 1.2.2 and right now operations through virsh seem to be working (mostly) well. But when using virt-manager (not the most up-to-date versions but some combinations that quite likely will occur (0.9.1 and 0.9.5)) there are some issues. Two symptoms seem to be caused by the same underlying problem which is caused by the way virt-manager interacts with libvirt. That seems to be that virt-manager acquires a reference to virDomainPtr only once and then uses that for subsequent call. This is a problem because both virDomainPtr object and virDomainObjPtr objects contains a domid but only the latter gets updated. So both, creating a new guest (which must be a define step and then use the handle to start the guest) as well as rebooting a guest cause problems for virt-manager. Mainly because of libxlDomainGetInfo() which retrieves the virDomainObjPtr for a given virDommainPtr. Then checks the domain object to be active or not but uses the domid from the domain to call into libxl: if (!(vm = libxlDomObjFromDomain(dom))) goto cleanup; ... if (!virDomainObjIsActive(vm)) { info->cpuTime = 0; info->memory = vm->def->mem.cur_balloon; info->maxMem = vm->def->mem.max_balloon; } else { if (libxl_domain_info(priv->ctx, &d_info, dom->id) != 0) { This fails either with dom->id being -1 (still) or using the old id from before the reboot. A dirty hack seems to avoid this: static virDomainObjPtr libxlDomObjFromDomain(virDomainPtr dom) { virDomainObjPtr vm; libxlDriverPrivatePtr driver = dom->conn->privateData; char uuidstr[VIR_UUID_STRING_BUFLEN]; vm = virDomainObjListFindByUUID(driver->domains, dom->uuid); if (!vm) { virUUIDFormat(dom->uuid, uuidstr); virReportError(VIR_ERR_NO_DOMAIN, _("no domain with matching uuid '%s' (%s)"), uuidstr, dom->name); return NULL; - } + } else { + if (dom->id != vm->def->id) { + VIR_WARN("libxlDomObjFromDomain: domid changed (%d->%d)", + dom->id, vm->def->id); + dom->id = vm->def->id; + } + } return vm; } Not really sure this is the right way to go. Also because that warning above happens a lot more (repeatedly) than I would have expected. This may or may not be related to the next question. Not a problem but more a confusion on my side: virGetDomain() has commentary that says it will return either a pointer to an existing object or one to a new one. Its just, I cannot see that done in the code. Is that a future plan or was that way but was removed? I must admit I have not looked deeper into that but one explanation of the repeated domid mismatch would be if the virDomainPtr that virt-manager uses to communicate with libvirt would be cloned before doing things inside libvirt. And holding that handle for so long and not obtaining a fresh one through the various lookup interfaces each time could be a mistake of virt-manager. Unfortunately one that needs to be handled in some way. -Stefan

On Tue, Mar 25, 2014 at 04:22:54PM +0100, Stefan Bader wrote:
This started off with some regression testing after going forward to Xen-4.4. We currently would pair that with a libvirt version 1.2.2 and right now operations through virsh seem to be working (mostly) well. But when using virt-manager (not the most up-to-date versions but some combinations that quite likely will occur (0.9.1 and 0.9.5)) there are some issues.
Two symptoms seem to be caused by the same underlying problem which is caused by the way virt-manager interacts with libvirt. That seems to be that virt-manager acquires a reference to virDomainPtr only once and then uses that for subsequent call. This is a problem because both virDomainPtr object and virDomainObjPtr objects contains a domid but only the latter gets updated.
So both, creating a new guest (which must be a define step and then use the handle to start the guest) as well as rebooting a guest cause problems for virt-manager. Mainly because of libxlDomainGetInfo() which retrieves the virDomainObjPtr for a given virDommainPtr. Then checks the domain object to be active or not but uses the domid from the domain to call into libxl:
if (!(vm = libxlDomObjFromDomain(dom))) goto cleanup; ... if (!virDomainObjIsActive(vm)) { info->cpuTime = 0; info->memory = vm->def->mem.cur_balloon; info->maxMem = vm->def->mem.max_balloon; } else { if (libxl_domain_info(priv->ctx, &d_info, dom->id) != 0) {
This fails either with dom->id being -1 (still) or using the old id from before the reboot. A dirty hack seems to avoid this:
Yep, this is a pretty commonly hit problem. The solution is to *never* access dom->id in the driver code at all. eg In this code snippet above you could use vm->def->id instead which should be up to date. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 25.03.2014 16:36, Daniel P. Berrange wrote:
On Tue, Mar 25, 2014 at 04:22:54PM +0100, Stefan Bader wrote:
This started off with some regression testing after going forward to Xen-4.4. We currently would pair that with a libvirt version 1.2.2 and right now operations through virsh seem to be working (mostly) well. But when using virt-manager (not the most up-to-date versions but some combinations that quite likely will occur (0.9.1 and 0.9.5)) there are some issues.
Two symptoms seem to be caused by the same underlying problem which is caused by the way virt-manager interacts with libvirt. That seems to be that virt-manager acquires a reference to virDomainPtr only once and then uses that for subsequent call. This is a problem because both virDomainPtr object and virDomainObjPtr objects contains a domid but only the latter gets updated.
So both, creating a new guest (which must be a define step and then use the handle to start the guest) as well as rebooting a guest cause problems for virt-manager. Mainly because of libxlDomainGetInfo() which retrieves the virDomainObjPtr for a given virDommainPtr. Then checks the domain object to be active or not but uses the domid from the domain to call into libxl:
if (!(vm = libxlDomObjFromDomain(dom))) goto cleanup; ... if (!virDomainObjIsActive(vm)) { info->cpuTime = 0; info->memory = vm->def->mem.cur_balloon; info->maxMem = vm->def->mem.max_balloon; } else { if (libxl_domain_info(priv->ctx, &d_info, dom->id) != 0) {
This fails either with dom->id being -1 (still) or using the old id from before the reboot. A dirty hack seems to avoid this:
Yep, this is a pretty commonly hit problem. The solution is to *never* access dom->id in the driver code at all. eg In this code snippet above you could use vm->def->id instead which should be up to date.
OK, it just seems for something that should not be done, it is done plenty of times in the libxl_driver codebase right now. At least pause and unpause seem to do the same... -Stefan
Regards, Daniel

On Tue, Mar 25, 2014 at 04:42:25PM +0100, Stefan Bader wrote:
On 25.03.2014 16:36, Daniel P. Berrange wrote:
On Tue, Mar 25, 2014 at 04:22:54PM +0100, Stefan Bader wrote:
This started off with some regression testing after going forward to Xen-4.4. We currently would pair that with a libvirt version 1.2.2 and right now operations through virsh seem to be working (mostly) well. But when using virt-manager (not the most up-to-date versions but some combinations that quite likely will occur (0.9.1 and 0.9.5)) there are some issues.
Two symptoms seem to be caused by the same underlying problem which is caused by the way virt-manager interacts with libvirt. That seems to be that virt-manager acquires a reference to virDomainPtr only once and then uses that for subsequent call. This is a problem because both virDomainPtr object and virDomainObjPtr objects contains a domid but only the latter gets updated.
So both, creating a new guest (which must be a define step and then use the handle to start the guest) as well as rebooting a guest cause problems for virt-manager. Mainly because of libxlDomainGetInfo() which retrieves the virDomainObjPtr for a given virDommainPtr. Then checks the domain object to be active or not but uses the domid from the domain to call into libxl:
if (!(vm = libxlDomObjFromDomain(dom))) goto cleanup; ... if (!virDomainObjIsActive(vm)) { info->cpuTime = 0; info->memory = vm->def->mem.cur_balloon; info->maxMem = vm->def->mem.max_balloon; } else { if (libxl_domain_info(priv->ctx, &d_info, dom->id) != 0) {
This fails either with dom->id being -1 (still) or using the old id from before the reboot. A dirty hack seems to avoid this:
Yep, this is a pretty commonly hit problem. The solution is to *never* access dom->id in the driver code at all. eg In this code snippet above you could use vm->def->id instead which should be up to date.
OK, it just seems for something that should not be done, it is done plenty of times in the libxl_driver codebase right now. At least pause and unpause seem to do the same...
I expect those pretty much all need to be fixed Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 25.03.2014 16:46, Daniel P. Berrange wrote:
On Tue, Mar 25, 2014 at 04:42:25PM +0100, Stefan Bader wrote:
On 25.03.2014 16:36, Daniel P. Berrange wrote:
On Tue, Mar 25, 2014 at 04:22:54PM +0100, Stefan Bader wrote:
This started off with some regression testing after going forward to Xen-4.4. We currently would pair that with a libvirt version 1.2.2 and right now operations through virsh seem to be working (mostly) well. But when using virt-manager (not the most up-to-date versions but some combinations that quite likely will occur (0.9.1 and 0.9.5)) there are some issues.
Two symptoms seem to be caused by the same underlying problem which is caused by the way virt-manager interacts with libvirt. That seems to be that virt-manager acquires a reference to virDomainPtr only once and then uses that for subsequent call. This is a problem because both virDomainPtr object and virDomainObjPtr objects contains a domid but only the latter gets updated.
So both, creating a new guest (which must be a define step and then use the handle to start the guest) as well as rebooting a guest cause problems for virt-manager. Mainly because of libxlDomainGetInfo() which retrieves the virDomainObjPtr for a given virDommainPtr. Then checks the domain object to be active or not but uses the domid from the domain to call into libxl:
if (!(vm = libxlDomObjFromDomain(dom))) goto cleanup; ... if (!virDomainObjIsActive(vm)) { info->cpuTime = 0; info->memory = vm->def->mem.cur_balloon; info->maxMem = vm->def->mem.max_balloon; } else { if (libxl_domain_info(priv->ctx, &d_info, dom->id) != 0) {
This fails either with dom->id being -1 (still) or using the old id from before the reboot. A dirty hack seems to avoid this:
Yep, this is a pretty commonly hit problem. The solution is to *never* access dom->id in the driver code at all. eg In this code snippet above you could use vm->def->id instead which should be up to date.
OK, it just seems for something that should not be done, it is done plenty of times in the libxl_driver codebase right now. At least pause and unpause seem to do the same...
I expect those pretty much all need to be fixed
All-right, thanks for confirming. :) Will work on that. -Stefan
Regards, Daniel

There is a domain id in the virDomain structure as well as in the virDomainObj structure. While the former can become stale the latter is kept up to date. So it is safer to always (virDomainObjPtr)->def->id internally. This will fix issues seen when managing Xen guests through libvirt from virt-manager (not being able to get domain info after define or reboot). This was caused both though libxlDomainGetInfo() only but there were a lot of places that might potentially cause issues, too. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> --- src/libxl/libxl_driver.c | 75 +++++++++++++++++++++++----------------------- 1 file changed, 38 insertions(+), 37 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index fc97db4..b5061df 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -770,10 +770,10 @@ libxlDomainSuspend(virDomainPtr dom) priv = vm->privateData; if (virDomainObjGetState(vm, NULL) != VIR_DOMAIN_PAUSED) { - if (libxl_domain_pause(priv->ctx, dom->id) != 0) { + if (libxl_domain_pause(priv->ctx, vm->def->id) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to suspend domain '%d' with libxenlight"), - dom->id); + vm->def->id); goto endjob; } @@ -829,10 +829,10 @@ libxlDomainResume(virDomainPtr dom) priv = vm->privateData; if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PAUSED) { - if (libxl_domain_unpause(priv->ctx, dom->id) != 0) { + if (libxl_domain_unpause(priv->ctx, vm->def->id) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to resume domain '%d' with libxenlight"), - dom->id); + vm->def->id); goto endjob; } @@ -883,10 +883,10 @@ libxlDomainShutdownFlags(virDomainPtr dom, unsigned int flags) } priv = vm->privateData; - if (libxl_domain_shutdown(priv->ctx, dom->id) != 0) { + if (libxl_domain_shutdown(priv->ctx, vm->def->id) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to shutdown domain '%d' with libxenlight"), - dom->id); + vm->def->id); goto cleanup; } @@ -930,10 +930,10 @@ libxlDomainReboot(virDomainPtr dom, unsigned int flags) } priv = vm->privateData; - if (libxl_domain_reboot(priv->ctx, dom->id) != 0) { + if (libxl_domain_reboot(priv->ctx, vm->def->id) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to reboot domain '%d' with libxenlight"), - dom->id); + vm->def->id); goto cleanup; } ret = 0; @@ -974,7 +974,7 @@ libxlDomainDestroyFlags(virDomainPtr dom, priv = vm->privateData; if (libxl_domain_destroy(priv->ctx, vm->def->id, NULL) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, - _("Failed to destroy domain '%d'"), dom->id); + _("Failed to destroy domain '%d'"), vm->def->id); goto cleanup; } @@ -1105,10 +1105,10 @@ libxlDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, if (flags & VIR_DOMAIN_MEM_LIVE) { priv = vm->privateData; - if (libxl_domain_setmaxmem(priv->ctx, dom->id, newmem) < 0) { + if (libxl_domain_setmaxmem(priv->ctx, vm->def->id, newmem) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to set maximum memory for domain '%d'" - " with libxenlight"), dom->id); + " with libxenlight"), vm->def->id); goto endjob; } } @@ -1138,13 +1138,13 @@ libxlDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, priv = vm->privateData; /* Unlock virDomainObj while ballooning memory */ virObjectUnlock(vm); - res = libxl_set_memory_target(priv->ctx, dom->id, newmem, 0, + res = libxl_set_memory_target(priv->ctx, vm->def->id, newmem, 0, /* force */ 1); virObjectLock(vm); if (res < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to set memory for domain '%d'" - " with libxenlight"), dom->id); + " with libxenlight"), vm->def->id); goto endjob; } } @@ -1202,9 +1202,10 @@ libxlDomainGetInfo(virDomainPtr dom, virDomainInfoPtr info) info->memory = vm->def->mem.cur_balloon; info->maxMem = vm->def->mem.max_balloon; } else { - if (libxl_domain_info(priv->ctx, &d_info, dom->id) != 0) { + if (libxl_domain_info(priv->ctx, &d_info, vm->def->id) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, - _("libxl_domain_info failed for domain '%d'"), dom->id); + _("libxl_domain_info failed for domain '%d'"), + vm->def->id); goto cleanup; } info->cpuTime = d_info.cpu_time; @@ -1483,11 +1484,11 @@ libxlDomainCoreDump(virDomainPtr dom, const char *to, unsigned int flags) if (!(flags & VIR_DUMP_LIVE) && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { - if (libxl_domain_pause(priv->ctx, dom->id) != 0) { + if (libxl_domain_pause(priv->ctx, vm->def->id) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Before dumping core, failed to suspend domain '%d'" " with libxenlight"), - dom->id); + vm->def->id); goto endjob; } virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_DUMP); @@ -1496,20 +1497,20 @@ libxlDomainCoreDump(virDomainPtr dom, const char *to, unsigned int flags) /* Unlock virDomainObj while dumping core */ virObjectUnlock(vm); - ret = libxl_domain_core_dump(priv->ctx, dom->id, to, NULL); + ret = libxl_domain_core_dump(priv->ctx, vm->def->id, to, NULL); virObjectLock(vm); if (ret != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to dump core of domain '%d' with libxenlight"), - dom->id); + vm->def->id); ret = -1; goto unpause; } if (flags & VIR_DUMP_CRASH) { - if (libxl_domain_destroy(priv->ctx, dom->id, NULL) < 0) { + if (libxl_domain_destroy(priv->ctx, vm->def->id, NULL) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, - _("Failed to destroy domain '%d'"), dom->id); + _("Failed to destroy domain '%d'"), vm->def->id); goto unpause; } @@ -1524,10 +1525,10 @@ libxlDomainCoreDump(virDomainPtr dom, const char *to, unsigned int flags) unpause: if (virDomainObjIsActive(vm) && paused) { - if (libxl_domain_unpause(priv->ctx, dom->id) != 0) { + if (libxl_domain_unpause(priv->ctx, vm->def->id) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("After dumping core, failed to resume domain '%d' with" - " libxenlight"), dom->id); + " libxenlight"), vm->def->id); } else { virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, VIR_DOMAIN_RUNNING_UNPAUSED); @@ -1786,19 +1787,19 @@ libxlDomainSetVcpusFlags(virDomainPtr dom, unsigned int nvcpus, break; case VIR_DOMAIN_VCPU_LIVE: - if (libxl_set_vcpuonline(priv->ctx, dom->id, &map) != 0) { + if (libxl_set_vcpuonline(priv->ctx, vm->def->id, &map) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to set vcpus for domain '%d'" - " with libxenlight"), dom->id); + " with libxenlight"), vm->def->id); goto endjob; } break; case VIR_DOMAIN_VCPU_LIVE | VIR_DOMAIN_VCPU_CONFIG: - if (libxl_set_vcpuonline(priv->ctx, dom->id, &map) != 0) { + if (libxl_set_vcpuonline(priv->ctx, vm->def->id, &map) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to set vcpus for domain '%d'" - " with libxenlight"), dom->id); + " with libxenlight"), vm->def->id); goto endjob; } def->vcpus = nvcpus; @@ -1934,7 +1935,7 @@ libxlDomainPinVcpuFlags(virDomainPtr dom, unsigned int vcpu, libxlDomainObjPrivatePtr priv; priv = vm->privateData; - if (libxl_set_vcpuaffinity(priv->ctx, dom->id, vcpu, &map) != 0) { + if (libxl_set_vcpuaffinity(priv->ctx, vm->def->id, vcpu, &map) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to pin vcpu '%d' with libxenlight"), vcpu); @@ -2099,11 +2100,11 @@ libxlDomainGetVcpus(virDomainPtr dom, virVcpuInfoPtr info, int maxinfo, } priv = vm->privateData; - if ((vcpuinfo = libxl_list_vcpu(priv->ctx, dom->id, &maxcpu, + if ((vcpuinfo = libxl_list_vcpu(priv->ctx, vm->def->id, &maxcpu, &hostcpus)) == NULL) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to list vcpus for domain '%d' with libxenlight"), - dom->id); + vm->def->id); goto cleanup; } @@ -3608,7 +3609,7 @@ libxlDomainGetSchedulerType(virDomainPtr dom, int *nparams) default: virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to get scheduler id for domain '%d'" - " with libxenlight"), dom->id); + " with libxenlight"), vm->def->id); goto cleanup; } @@ -3659,10 +3660,10 @@ libxlDomainGetSchedulerParametersFlags(virDomainPtr dom, goto cleanup; } - if (libxl_domain_sched_params_get(priv->ctx, dom->id, &sc_info) != 0) { + if (libxl_domain_sched_params_get(priv->ctx, vm->def->id, &sc_info) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to get scheduler parameters for domain '%d'" - " with libxenlight"), dom->id); + " with libxenlight"), vm->def->id); goto cleanup; } @@ -3740,10 +3741,10 @@ libxlDomainSetSchedulerParametersFlags(virDomainPtr dom, goto endjob; } - if (libxl_domain_sched_params_get(priv->ctx, dom->id, &sc_info) != 0) { + if (libxl_domain_sched_params_get(priv->ctx, vm->def->id, &sc_info) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to get scheduler parameters for domain '%d'" - " with libxenlight"), dom->id); + " with libxenlight"), vm->def->id); goto endjob; } @@ -3756,10 +3757,10 @@ libxlDomainSetSchedulerParametersFlags(virDomainPtr dom, sc_info.cap = params[i].value.ui; } - if (libxl_domain_sched_params_set(priv->ctx, dom->id, &sc_info) != 0) { + if (libxl_domain_sched_params_set(priv->ctx, vm->def->id, &sc_info) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to set scheduler parameters for domain '%d'" - " with libxenlight"), dom->id); + " with libxenlight"), vm->def->id); goto endjob; } -- 1.7.9.5

On 25.03.2014 18:39, Stefan Bader wrote:
There is a domain id in the virDomain structure as well as in the virDomainObj structure. While the former can become stale the latter is kept up to date. So it is safer to always (virDomainObjPtr)->def->id internally.
This will fix issues seen when managing Xen guests through libvirt from virt-manager (not being able to get domain info after define or reboot). This was caused both though libxlDomainGetInfo() only but there were a lot of places that might potentially cause issues, too.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com> --- src/libxl/libxl_driver.c | 75 +++++++++++++++++++++++----------------------- 1 file changed, 38 insertions(+), 37 deletions(-)
ACK Michal
participants (3)
-
Daniel P. Berrange
-
Michal Privoznik
-
Stefan Bader