[libvirt] [PATCH] Fix buzzilla 738778

This patch fixes the bug shown in bugzilla 738778. It's not an nwfilter problem but a connection sharing / closure issue. https://bugzilla.redhat.com/show_bug.cgi?id=738778 Depending on the speed / #CPUs of the machine you are using you may not see this bug all the time. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f8a8475..1991777 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2539,6 +2539,10 @@ struct qemuProcessReconnectData { /* * Open an existing VM's monitor, re-detect VCPU threads * and re-reserve the security labels in use + * + * We own the virConnectPtr we are passed here - whoever started + * this thread function has increased the reference counter to it + * so that we now have to close it. */ static void qemuProcessReconnect(void *opaque) @@ -2632,6 +2636,8 @@ qemuProcessReconnect(void *opaque) qemuDriverUnlock(driver); + virConnectClose(conn); + return; error: @@ -2656,6 +2662,8 @@ error: virDomainObjUnlock(obj); } qemuDriverUnlock(driver); + + virConnectClose(conn); } static void @@ -2706,7 +2714,16 @@ qemuProcessReconnectHelper(void *payload, if (qemuDomainObjBeginJobWithDriver(src->driver, obj, QEMU_JOB_MODIFY) < 0) goto error; + /* Since we close the connection later on, we have to make sure + * that the threads we start see a valid connection throughout their + * lifetime. We simply increase the reference counter here. + */ + virConnectRef(data->conn); + if (virThreadCreate(&thread, true, qemuProcessReconnect, data) < 0) { + + virConnectClose(data->conn); + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Could not create thread. QEMU initialization " "might be incomplete"));

On Fri, Sep 16, 2011 at 09:03:30AM -0400, Stefan Berger wrote:
This patch fixes the bug shown in bugzilla 738778. It's not an nwfilter problem but a connection sharing / closure issue.
https://bugzilla.redhat.com/show_bug.cgi?id=738778
Depending on the speed / #CPUs of the machine you are using you may not see this bug all the time.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f8a8475..1991777 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2539,6 +2539,10 @@ struct qemuProcessReconnectData { /* * Open an existing VM's monitor, re-detect VCPU threads * and re-reserve the security labels in use + * + * We own the virConnectPtr we are passed here - whoever started + * this thread function has increased the reference counter to it + * so that we now have to close it. */ static void qemuProcessReconnect(void *opaque) @@ -2632,6 +2636,8 @@ qemuProcessReconnect(void *opaque)
qemuDriverUnlock(driver);
+ virConnectClose(conn); + return;
error: @@ -2656,6 +2662,8 @@ error: virDomainObjUnlock(obj); } qemuDriverUnlock(driver); + + virConnectClose(conn); }
static void @@ -2706,7 +2714,16 @@ qemuProcessReconnectHelper(void *payload, if (qemuDomainObjBeginJobWithDriver(src->driver, obj, QEMU_JOB_MODIFY) < 0) goto error;
+ /* Since we close the connection later on, we have to make sure + * that the threads we start see a valid connection throughout their + * lifetime. We simply increase the reference counter here. + */ + virConnectRef(data->conn); + if (virThreadCreate(&thread, true, qemuProcessReconnect, data) < 0) { + + virConnectClose(data->conn); + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Could not create thread. QEMU initialization " "might be incomplete"));
That sounds like a nasty bug and hard to track ! I think this makes sense, hopefully I'm right :-) ACK, please push ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On 09/16/2011 09:39 AM, Daniel Veillard wrote:
On Fri, Sep 16, 2011 at 09:03:30AM -0400, Stefan Berger wrote:
This patch fixes the bug shown in bugzilla 738778. It's not an nwfilter problem but a connection sharing / closure issue.
https://bugzilla.redhat.com/show_bug.cgi?id=738778
Depending on the speed / #CPUs of the machine you are using you may not see this bug all the time.
Signed-off-by: Stefan Berger<stefanb@linux.vnet.ibm.com>
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f8a8475..1991777 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2539,6 +2539,10 @@ struct qemuProcessReconnectData { /* * Open an existing VM's monitor, re-detect VCPU threads * and re-reserve the security labels in use + * + * We own the virConnectPtr we are passed here - whoever started + * this thread function has increased the reference counter to it + * so that we now have to close it. */ static void qemuProcessReconnect(void *opaque) @@ -2632,6 +2636,8 @@ qemuProcessReconnect(void *opaque)
qemuDriverUnlock(driver);
+ virConnectClose(conn); + return;
error: @@ -2656,6 +2662,8 @@ error: virDomainObjUnlock(obj); } qemuDriverUnlock(driver); + + virConnectClose(conn); }
static void @@ -2706,7 +2714,16 @@ qemuProcessReconnectHelper(void *payload, if (qemuDomainObjBeginJobWithDriver(src->driver, obj, QEMU_JOB_MODIFY)< 0) goto error;
+ /* Since we close the connection later on, we have to make sure + * that the threads we start see a valid connection throughout their + * lifetime. We simply increase the reference counter here. + */ + virConnectRef(data->conn); + if (virThreadCreate(&thread, true, qemuProcessReconnect, data)< 0) { + + virConnectClose(data->conn); + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Could not create thread. QEMU initialization " "might be incomplete")); That sounds like a nasty bug and hard to track ! I think this makes sense, hopefully I'm right :-) At least I don't see it anymore. Hopefully Laine is down to 0% from 100% of seeing this one. ACK, please push !
Pushed. Stefan
Daniel

On 09/16/2011 09:49 AM, Stefan Berger wrote:
On 09/16/2011 09:39 AM, Daniel Veillard wrote:
On Fri, Sep 16, 2011 at 09:03:30AM -0400, Stefan Berger wrote:
This patch fixes the bug shown in bugzilla 738778. It's not an nwfilter problem but a connection sharing / closure issue.
https://bugzilla.redhat.com/show_bug.cgi?id=738778
Depending on the speed / #CPUs of the machine you are using you may not see this bug all the time.
[...] That sounds like a nasty bug and hard to track ! I think this makes sense, hopefully I'm right :-) At least I don't see it anymore. Hopefully Laine is down to 0% from 100% of seeing this one.
Just FYI, yes this did completely eliminate the crashes. Thanks for finding the source!
participants (3)
-
Daniel Veillard
-
Laine Stump
-
Stefan Berger