[libvirt] [PATCH 0/3 RESENT] Fix virsh race and coredump

Please consider for 1.0.4 as this prevents a potential data corruption or segfault. The race is caused by referencing a disposed connection object in a callback. In the first patch we make sure that the object reference count is reflecting the callback registration. This prevents a premature disposal. The second patch prevents the invocation of a NULL callback. The last one tries to ensure that we don't leak connection references in virsh. Viktor Mihajlovski (3): libvirt: Increase connection reference count for callbacks remote: Don't call NULL closeFreeCallback virsh: Unregister the connection close notifier upon termination src/libvirt.c | 5 +++++ src/remote/remote_driver.c | 3 ++- tools/virsh.c | 23 +++++++++++++++++------ 3 files changed, 24 insertions(+), 7 deletions(-) -- 1.7.9.5

By adjusting the reference count of the connection object we prevent races between callback function and virConnectClose. Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- src/libvirt.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/libvirt.c b/src/libvirt.c index 1624776..1aed2b4 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -20186,6 +20186,8 @@ int virConnectRegisterCloseCallback(virConnectPtr conn, return -1; } + virObjectRef(conn); + virMutexLock(&conn->lock); virCheckNonNullArgGoto(cb, error); @@ -20206,6 +20208,7 @@ int virConnectRegisterCloseCallback(virConnectPtr conn, error: virMutexUnlock(&conn->lock); + virObjectUnref(conn); virDispatchError(NULL); return -1; } @@ -20255,6 +20258,8 @@ int virConnectUnregisterCloseCallback(virConnectPtr conn, virMutexUnlock(&conn->lock); + virObjectUnref(conn); + return 0; error: -- 1.7.9.5

On 03/26/13 10:54, Viktor Mihajlovski wrote:
By adjusting the reference count of the connection object we prevent races between callback function and virConnectClose.
Hum. Here I agree that this is definitely possible (and I managed to reproduce the invalid read and possible memory corruption) but I don't completely like the fix here. As the callback is called at a moment when the connection won't be usable anymore I think that the callback should automatically unregister itself and also clear the opaque data if that is requested. (Which isn't done right now if the caller doesn't unregister it). With automatic unregistration we save a lot of hassle, and also avoid the need of hackery that you needed to add in the 3/3 patch to avoid leaking the reference. I also think that the virConnectClose function should automatically get rid of the callback if the caller doesn't do it before. Dan, any ideas on this? Peter P.S.: This should be fixed before the release.

On Wed, Mar 27, 2013 at 11:16:26AM +0100, Peter Krempa wrote:
On 03/26/13 10:54, Viktor Mihajlovski wrote:
By adjusting the reference count of the connection object we prevent races between callback function and virConnectClose.
Hum. Here I agree that this is definitely possible (and I managed to reproduce the invalid read and possible memory corruption) but I don't completely like the fix here.
As the callback is called at a moment when the connection won't be usable anymore I think that the callback should automatically unregister itself and also clear the opaque data if that is requested. (Which isn't done right now if the caller doesn't unregister it).
This only works if we can guarantee that the callback is invoked in 100% of paths that lead to the connection closing. This isn't the case for a client initiated close operation. So we're still left with a need to manually unregister the callback, except now instead of having todo that every time, we only need todo it some of the time. Either we require apps to never unregister it, or we require apps to always unregister - anything inbetweeen is just bad.
With automatic unregistration we save a lot of hassle, and also avoid the need of hackery that you needed to add in the 3/3 patch to avoid leaking the reference. I also think that the virConnectClose function should automatically get rid of the callback if the caller doesn't do it before.
The virConnectClose function is just a thin wrapper around virObjectUnref. To explicitly remove the callback here would require that you look at the reference count, but doing so is inherantly racey, which is why the virObjectUnref method only returns a boolean, indicating whether the object has been freed, instead of the actual ref count. Requiring apps to unregister the callbacks is the same that we have with existing domain lifecycle callbacks, so is not unexpected / out of the ordinary for applications. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/27/13 11:51, Daniel P. Berrange wrote:
On Wed, Mar 27, 2013 at 11:16:26AM +0100, Peter Krempa wrote:
On 03/26/13 10:54, Viktor Mihajlovski wrote:
By adjusting the reference count of the connection object we prevent races between callback function and virConnectClose.
Hum. Here I agree that this is definitely possible (and I managed to reproduce the invalid read and possible memory corruption) but I don't completely like the fix here.
As the callback is called at a moment when the connection won't be usable anymore I think that the callback should automatically unregister itself and also clear the opaque data if that is requested. (Which isn't done right now if the caller doesn't unregister it).
This only works if we can guarantee that the callback is invoked in 100% of paths that lead to the connection closing. This isn't the case for a client initiated close operation. So we're still left with
a need to manually unregister the callback, except now instead of having todo that every time, we only need todo it some of the time. Either we require apps to never unregister it, or we require apps to always unregister - anything inbetweeen is just bad.
With automatic unregistration we save a lot of hassle, and also avoid the need of hackery that you needed to add in the 3/3 patch to avoid leaking the reference. I also think that the virConnectClose function should automatically get rid of the callback if the caller doesn't do it before.
The virConnectClose function is just a thin wrapper around virObjectUnref. To explicitly remove the callback here would require that you look at the reference count, but doing so is inherantly racey, which is why the virObjectUnref method only returns a boolean, indicating whether the object has been freed, instead of the actual ref count.
Requiring apps to unregister the callbacks is the same that we have with existing domain lifecycle callbacks, so is not unexpected / out of the ordinary for applications.
Okay, in that case I can't see a better solution for now other than taking this patch.
Daniel
Peter

Check function pointer before calling. Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- src/remote/remote_driver.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 3721af9..885120e 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -358,7 +358,8 @@ static void remoteClientCloseFunc(virNetClientPtr client ATTRIBUTE_UNUSED, closeCallback(conn, reason, closeOpaque); virMutexLock(&conn->lock); conn->closeDispatch = false; - if (conn->closeUnregisterCount != closeUnregisterCount) + if (conn->closeUnregisterCount != closeUnregisterCount && + closeFreeCallback) closeFreeCallback(closeOpaque); } virMutexUnlock(&conn->lock); -- 1.7.9.5

On 03/26/13 10:54, Viktor Mihajlovski wrote:
Check function pointer before calling.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- src/remote/remote_driver.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 3721af9..885120e 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -358,7 +358,8 @@ static void remoteClientCloseFunc(virNetClientPtr client ATTRIBUTE_UNUSED, closeCallback(conn, reason, closeOpaque); virMutexLock(&conn->lock); conn->closeDispatch = false; - if (conn->closeUnregisterCount != closeUnregisterCount) + if (conn->closeUnregisterCount != closeUnregisterCount && + closeFreeCallback) closeFreeCallback(closeOpaque); } virMutexUnlock(&conn->lock);
ACK. The free callback isn't mandatory in the setter function so here it can't be expected either. Peter

On 03/27/13 10:36, Peter Krempa wrote:
On 03/26/13 10:54, Viktor Mihajlovski wrote:
Check function pointer before calling.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- src/remote/remote_driver.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
ACK. The free callback isn't mandatory in the setter function so here it can't be expected either.
I pushed this one now.
Peter

Before closing the connection we unregister the close callback to prevent a reference leak. We make sure that we only unregister if we have previously registered a callback (not the case for virsh connect!). Further, the messages on virConnectClose != 0 are a bit more specific now. Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- tools/virsh.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index b574d7e..3c0b398 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -311,6 +311,8 @@ vshCatchDisconnect(virConnectPtr conn ATTRIBUTE_UNUSED, disconnected++; } +static int callback_registered = 0; + /* * vshReconnect: * @@ -336,9 +338,13 @@ vshReconnect(vshControl *ctl) else vshError(ctl, "%s", _("failed to connect to the hypervisor")); } else { - if (virConnectRegisterCloseCallback(ctl->conn, vshCatchDisconnect, - NULL, NULL) < 0) - vshError(ctl, "%s", _("Unable to register disconnect callback")); + if (!callback_registered) { + if (virConnectRegisterCloseCallback(ctl->conn, vshCatchDisconnect, + NULL, NULL) == 0) + callback_registered = 1; + else + vshError(ctl, "%s", _("Unable to register disconnect callback")); + } if (connected) vshError(ctl, "%s", _("Reconnected to the hypervisor")); } @@ -2665,9 +2671,14 @@ vshDeinit(vshControl *ctl) VIR_FREE(ctl->name); if (ctl->conn) { int ret; - if ((ret = virConnectClose(ctl->conn)) != 0) { - vshError(ctl, _("Failed to disconnect from the hypervisor, %d leaked reference(s)"), ret); - } + if (callback_registered && + virConnectUnregisterCloseCallback(ctl->conn, vshCatchDisconnect) < 0) + vshError(ctl, "%s", _("Failed to unregister disconnect callback")); + ret = virConnectClose(ctl->conn); + if (ret < 0) + vshError(ctl, "%s", _("Failed to disconnect from the hypervisor")); + else if (ret > 0) + vshError(ctl, "%s", _("Leaked reference(s) after disconnect from the hypervisor")); } virResetLastError(); -- 1.7.9.5

On 03/26/13 10:54, Viktor Mihajlovski wrote:
Before closing the connection we unregister the close callback to prevent a reference leak. We make sure that we only unregister if we have previously registered a callback (not the case for virsh connect!). Further, the messages on virConnectClose != 0
In that case the callback should be registered also on "virsh connect" and also you did not unregister the callback there if you are changing from an active connection.
are a bit more specific now.
I like this change.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> --- tools/virsh.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/tools/virsh.c b/tools/virsh.c index b574d7e..3c0b398 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -311,6 +311,8 @@ vshCatchDisconnect(virConnectPtr conn ATTRIBUTE_UNUSED, disconnected++; }
+static int callback_registered = 0; +
When you actually register and unregister the callback in cmdConnect, you don't need to have this global variable. This change will need rearrangement of some code. I'll send the patch for this. Peter
participants (3)
-
Daniel P. Berrange
-
Peter Krempa
-
Viktor Mihajlovski