[libvirt] Introduction of virDomainDefPtr resulted in deadlock in xenUnifiedDomainGetXMLDesc

The problem is the mutex lock on xenUnifiedPrivatePtr which is held around xenDomainUsedCpus. xenUnifiedDomainGetXMLDesc ... xenUnifiedLock(priv); cpus = xenDomainUsedCpus(dom); xenUnifiedUnlock(priv); ... Unfortunately the introduction of virDomainDefPtr added the following call paths xenDomainUsedCpus ... nb_vcpu = xenUnifiedDomainGetMaxVcpus(dom); return xenUnifiedDomainGetVcpusFlags(...) ... if (!(def = xenGetDomainDefForDom(dom))) return xenGetDomainDefForUUID(dom->conn, dom->uuid); ... ret = xenHypervisorLookupDomainByUUID(conn, uuid); ... xenUnifiedLock(priv); name = xenStoreDomainGetName(conn, id); xenUnifiedUnlock(priv); ... if ((ncpus = xenUnifiedDomainGetVcpus(dom, cpuinfo, nb_vcpu, ... if (!(def = xenGetDomainDefForDom(dom))) [again like above] Right now, running the GetXMLDesc command for an active Xen domain will lock up right in the xenUnifiedDomainGetMaxVcpus call. But any subcall leading to a call to xenGetDomainDefForDom while holding the xenUnifiedPrivatePtr lock will have the same fate. I assume the lock around the xenDomainUsedCpus call is there to ensure all accesses to the private pointer see consistent data. Otherwise it would be possible to simply release the lock before the GetMaxVcpus and GetVcpus calls. If that lock cannot be dropped this feels like a much more painful rework is needed. What do others think? -Stefan

So this is one way I am able to get this issue resolved. It might not be ideal as maybe there is a slight chance of inconsistencies. But at least it does not lock up anymore. -Stefan ---
From f0c28a9f7af84a01bb2c3d71067adefb92e919d9 Mon Sep 17 00:00:00 2001 From: Stefan Bader <stefan.bader@canonical.com> Date: Mon, 24 Jun 2013 09:30:20 +0200 Subject: [PATCH] xen: Avoid deadlock in xenUnifiedDomainGetXMLDesc
Commit 95e18efd added the use virDomainDefPtr to several Xen driver methods. That structure is obtained by calling xenGetDomainDefForDom() which will acquire the mutex to the priv pointer. Unfortunately (at least) xenUnifiedDomainGetXMLDesc already takes that mutex and now is locking up while calling into xenDomainUsedCpus(). xenDomainUsedCpus ... nb_vcpu = xenUnifiedDomainGetMaxVcpus(dom); return xenUnifiedDomainGetVcpusFlags(...) ... if (!(def = xenGetDomainDefForDom(dom))) return xenGetDomainDefForUUID(dom->conn, dom->uuid); ... ret = xenHypervisorLookupDomainByUUID(conn, uuid); ... xenUnifiedLock(priv); name = xenStoreDomainGetName(conn, id); xenUnifiedUnlock(priv); ... if ((ncpus = xenUnifiedDomainGetVcpus(dom, cpuinfo, nb_vcpu, ... if (!(def = xenGetDomainDefForDom(dom))) [again like above] The deadlock is observable when connecting to a Xen host using the old xm stack and trying to obtain domain XML data from any running guest (like "dumpxml 0"). This would be a minimal fix that tries to avoid the deadlock. I am not completely sure this is not allowing a small window for getting inconsistent data. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> --- src/xen/xen_driver.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/xen/xen_driver.c b/src/xen/xen_driver.c index 3efc27a..64cc305 100644 --- a/src/xen/xen_driver.c +++ b/src/xen/xen_driver.c @@ -189,6 +189,7 @@ xenDomainUsedCpus(virDomainPtr dom) unsigned char *cpumap = NULL; size_t cpumaplen; int nb = 0; + int nbNodeCpus; int n, m; virVcpuInfoPtr cpuinfo = NULL; virNodeInfo nodeinfo; @@ -198,8 +199,11 @@ xenDomainUsedCpus(virDomainPtr dom) return NULL; priv = dom->conn->privateData; + xenUnifiedLock(priv); + nbNodeCpus = priv->nbNodeCpus; + xenUnifiedUnlock(priv); - if (priv->nbNodeCpus <= 0) + if (nbNodeCpus <= 0) return NULL; nb_vcpu = xenUnifiedDomainGetMaxVcpus(dom); if (nb_vcpu <= 0) @@ -207,7 +211,7 @@ xenDomainUsedCpus(virDomainPtr dom) if (xenUnifiedNodeGetInfo(dom->conn, &nodeinfo) < 0) return NULL; - if (!(cpulist = virBitmapNew(priv->nbNodeCpus))) { + if (!(cpulist = virBitmapNew(nbNodeCpus))) { virReportOOMError(); goto done; } @@ -225,7 +229,7 @@ xenDomainUsedCpus(virDomainPtr dom) if ((ncpus = xenUnifiedDomainGetVcpus(dom, cpuinfo, nb_vcpu, cpumap, cpumaplen)) >= 0) { for (n = 0; n < ncpus; n++) { - for (m = 0; m < priv->nbNodeCpus; m++) { + for (m = 0; m < nbNodeCpus; m++) { bool used; ignore_value(virBitmapGetBit(cpulist, m, &used)); if ((!used) && @@ -233,7 +237,7 @@ xenDomainUsedCpus(virDomainPtr dom) ignore_value(virBitmapSetBit(cpulist, m)); nb++; /* if all CPU are used just return NULL */ - if (nb == priv->nbNodeCpus) + if (nb == nbNodeCpus) goto done; } @@ -1403,9 +1407,7 @@ xenUnifiedDomainGetXMLDesc(virDomainPtr dom, unsigned int flags) def = xenXMDomainGetXMLDesc(dom->conn, minidef); } else { char *cpus; - xenUnifiedLock(priv); cpus = xenDomainUsedCpus(dom); - xenUnifiedUnlock(priv); def = xenDaemonDomainGetXMLDesc(dom->conn, minidef, cpus); VIR_FREE(cpus); } -- 1.7.9.5

So this is one way I am able to get this issue resolved. It might not be ideal as maybe there is a slight chance of inconsistencies. But at least it does not lock up anymore. -Stefan ---
From f0c28a9f7af84a01bb2c3d71067adefb92e919d9 Mon Sep 17 00:00:00 2001 From: Stefan Bader <stefan.bader@canonical.com> Date: Mon, 24 Jun 2013 09:30:20 +0200 Subject: [PATCH] xen: Avoid deadlock in xenUnifiedDomainGetXMLDesc
Commit 95e18efd added the use virDomainDefPtr to several Xen driver methods. That structure is obtained by calling xenGetDomainDefForDom() which will acquire the mutex to the priv pointer. Unfortunately (at least) xenUnifiedDomainGetXMLDesc already takes that mutex and now is locking up while calling into xenDomainUsedCpus(). xenDomainUsedCpus ... nb_vcpu = xenUnifiedDomainGetMaxVcpus(dom); return xenUnifiedDomainGetVcpusFlags(...) ... if (!(def = xenGetDomainDefForDom(dom))) return xenGetDomainDefForUUID(dom->conn, dom->uuid); ... ret = xenHypervisorLookupDomainByUUID(conn, uuid); ... xenUnifiedLock(priv); name = xenStoreDomainGetName(conn, id); xenUnifiedUnlock(priv); ... if ((ncpus = xenUnifiedDomainGetVcpus(dom, cpuinfo, nb_vcpu, ... if (!(def = xenGetDomainDefForDom(dom))) [again like above] The deadlock is observable when connecting to a Xen host using the old xm stack and trying to obtain domain XML data from any running guest (like "dumpxml 0"). This would be a minimal fix that tries to avoid the deadlock. I am not completely sure this is not allowing a small window for getting inconsistent data. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> --- src/xen/xen_driver.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/src/xen/xen_driver.c b/src/xen/xen_driver.c index 3efc27a..64cc305 100644 --- a/src/xen/xen_driver.c +++ b/src/xen/xen_driver.c @@ -189,6 +189,7 @@ xenDomainUsedCpus(virDomainPtr dom) unsigned char *cpumap = NULL; size_t cpumaplen; int nb = 0; + int nbNodeCpus; int n, m; virVcpuInfoPtr cpuinfo = NULL; virNodeInfo nodeinfo; @@ -198,8 +199,11 @@ xenDomainUsedCpus(virDomainPtr dom) return NULL; priv = dom->conn->privateData; + xenUnifiedLock(priv); + nbNodeCpus = priv->nbNodeCpus; + xenUnifiedUnlock(priv); - if (priv->nbNodeCpus <= 0) + if (nbNodeCpus <= 0) return NULL; nb_vcpu = xenUnifiedDomainGetMaxVcpus(dom); if (nb_vcpu <= 0) @@ -207,7 +211,7 @@ xenDomainUsedCpus(virDomainPtr dom) if (xenUnifiedNodeGetInfo(dom->conn, &nodeinfo) < 0) return NULL; - if (!(cpulist = virBitmapNew(priv->nbNodeCpus))) { + if (!(cpulist = virBitmapNew(nbNodeCpus))) { virReportOOMError(); goto done; } @@ -225,7 +229,7 @@ xenDomainUsedCpus(virDomainPtr dom) if ((ncpus = xenUnifiedDomainGetVcpus(dom, cpuinfo, nb_vcpu, cpumap, cpumaplen)) >= 0) { for (n = 0; n < ncpus; n++) { - for (m = 0; m < priv->nbNodeCpus; m++) { + for (m = 0; m < nbNodeCpus; m++) { bool used; ignore_value(virBitmapGetBit(cpulist, m, &used)); if ((!used) && @@ -233,7 +237,7 @@ xenDomainUsedCpus(virDomainPtr dom) ignore_value(virBitmapSetBit(cpulist, m)); nb++; /* if all CPU are used just return NULL */ - if (nb == priv->nbNodeCpus) + if (nb == nbNodeCpus) goto done; } @@ -1403,9 +1407,7 @@ xenUnifiedDomainGetXMLDesc(virDomainPtr dom, unsigned int flags) def = xenXMDomainGetXMLDesc(dom->conn, minidef); } else { char *cpus; - xenUnifiedLock(priv); cpus = xenDomainUsedCpus(dom); - xenUnifiedUnlock(priv); def = xenDaemonDomainGetXMLDesc(dom->conn, minidef, cpus); VIR_FREE(cpus); } -- 1.7.9.5

On Mon, Jun 24, 2013 at 11:08:16AM +0200, Stefan Bader wrote:
So this is one way I am able to get this issue resolved. It might not be ideal as maybe there is a slight chance of inconsistencies. But at least it does not lock up anymore.
-Stefan
---
From f0c28a9f7af84a01bb2c3d71067adefb92e919d9 Mon Sep 17 00:00:00 2001 From: Stefan Bader <stefan.bader@canonical.com> Date: Mon, 24 Jun 2013 09:30:20 +0200 Subject: [PATCH] xen: Avoid deadlock in xenUnifiedDomainGetXMLDesc
Commit 95e18efd added the use virDomainDefPtr to several Xen driver methods. That structure is obtained by calling xenGetDomainDefForDom() which will acquire the mutex to the priv pointer. Unfortunately (at least) xenUnifiedDomainGetXMLDesc already takes that mutex and now is locking up while calling into xenDomainUsedCpus().
xenDomainUsedCpus ... nb_vcpu = xenUnifiedDomainGetMaxVcpus(dom); return xenUnifiedDomainGetVcpusFlags(...) ... if (!(def = xenGetDomainDefForDom(dom))) return xenGetDomainDefForUUID(dom->conn, dom->uuid); ... ret = xenHypervisorLookupDomainByUUID(conn, uuid); ... xenUnifiedLock(priv); name = xenStoreDomainGetName(conn, id); xenUnifiedUnlock(priv); ... if ((ncpus = xenUnifiedDomainGetVcpus(dom, cpuinfo, nb_vcpu, ... if (!(def = xenGetDomainDefForDom(dom))) [again like above]
The deadlock is observable when connecting to a Xen host using the old xm stack and trying to obtain domain XML data from any running guest (like "dumpxml 0").
This would be a minimal fix that tries to avoid the deadlock. I am not completely sure this is not allowing a small window for getting inconsistent data.
The root cause is that one public API is calling into another public API. This is bad practice in general - as well as the deadlock you see, it will result in multiple access control checks which should not happen. I'm looking into how to refactor the code to simplify the calling pattern here. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Daniel P. Berrange
-
Stefan Bader