[libvirt] libvirt-python bug: custom event loop impl calls ff callback from *Remove(Handle|Timeout)Func

Hello libvirt-list, As of current libvirt-python.git, according to libvirt-override.c, if implementing custom event loop in Python, ff callback is called from libvirt_virEventRemoveHandleFunc, which is a C glue between virEventRegisterImpl and actual removeHandle function written in Python:
result = PyEval_CallObject(removeHandleObj, pyobj_args); if (!result) { PyErr_Print(); PyErr_Clear(); } else if (!PyTuple_Check(result) || PyTuple_Size(result) != 3) { DEBUG("%s: %s must return opaque obj registered with %s" "to avoid leaking libvirt memory\n", __FUNCTION__, NAME(removeHandle), NAME(addHandle)); } else { opaque = PyTuple_GetItem(result, 1); ff = PyTuple_GetItem(result, 2); cff = PyvirFreeCallback_Get(ff); if (cff) (*cff)(PyvirVoidPtr_Get(opaque)); retval = 0; }
This is exactly what one shoud not be doing according to documentation [1]:
If the opaque user data requires free'ing when the handle is unregistered, then a 2nd callback can be supplied for this purpose. This callback needs to be invoked from a clean stack. If 'ff' callbacks are invoked directly from the virEventRemoveHandleFunc they will likely deadlock in libvirt.
[1] https://libvirt.org/html/libvirt-libvirt-event.html#virEventAddHandleFunc This is true, the deadlock occurs. When the "result" tuple is mangled to have None as third item ("ff"), then cff = PyvirFreeCallback_Get(ff) is NULL and the deadlock does not happen. That's also why script examples/event-test.py does not deadlock, because it does not return anything (that is, returns None) from Python, so the second if block happens and the ff callback, if any, is not executed (and probably something leaks, but I didn't check for that). Everything also applies to to timeouts (libvirt_virEventRemoteTimeoutFunc). -- pozdrawiam / best regards _.-._ Wojtek Porczyk .-^' '^-. Invisible Things Lab |'-.-^-.-'| | | | | I do not fear computers, | '-.-' | I fear lack of them. '-._ : ,-' -- Isaac Asimov `^-^-_>

On Thu, Jan 19, 2017 at 12:48:11PM +0100, Wojtek Porczyk wrote:
Hello libvirt-list,
As of current libvirt-python.git, according to libvirt-override.c, if implementing custom event loop in Python, ff callback is called from libvirt_virEventRemoveHandleFunc, which is a C glue between virEventRegisterImpl and actual removeHandle function written in Python:
result = PyEval_CallObject(removeHandleObj, pyobj_args); if (!result) { PyErr_Print(); PyErr_Clear(); } else if (!PyTuple_Check(result) || PyTuple_Size(result) != 3) { DEBUG("%s: %s must return opaque obj registered with %s" "to avoid leaking libvirt memory\n", __FUNCTION__, NAME(removeHandle), NAME(addHandle)); } else { opaque = PyTuple_GetItem(result, 1); ff = PyTuple_GetItem(result, 2); cff = PyvirFreeCallback_Get(ff); if (cff) (*cff)(PyvirVoidPtr_Get(opaque)); retval = 0; }
This is exactly what one shoud not be doing according to documentation [1]:
If the opaque user data requires free'ing when the handle is unregistered, then a 2nd callback can be supplied for this purpose. This callback needs to be invoked from a clean stack. If 'ff' callbacks are invoked directly from the virEventRemoveHandleFunc they will likely deadlock in libvirt.
[1] https://libvirt.org/html/libvirt-libvirt-event.html#virEventAddHandleFunc
This is true, the deadlock occurs. When the "result" tuple is mangled to have None as third item ("ff"), then cff = PyvirFreeCallback_Get(ff) is NULL and the deadlock does not happen.
That's also why script examples/event-test.py does not deadlock, because it does not return anything (that is, returns None) from Python, so the second if block happens and the ff callback, if any, is not executed (and probably something leaks, but I didn't check for that).
Everything also applies to to timeouts (libvirt_virEventRemoteTimeoutFunc).
Yes, your analysis looks correct to me. I don't think it is easy to fix this in the libvirt-override.c file really, unless we were to spawn a throw away thread to call "ff" in. The better approach would be for the user suppied event loop impl of removeHandle to arrange to call 'ff' themselves at the correct time. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Thu, Jan 19, 2017 at 11:56:36AM +0000, Daniel P. Berrange wrote:
On Thu, Jan 19, 2017 at 12:48:11PM +0100, Wojtek Porczyk wrote:
Hello libvirt-list, (snip)
} else { opaque = PyTuple_GetItem(result, 1); ff = PyTuple_GetItem(result, 2); cff = PyvirFreeCallback_Get(ff); if (cff) (*cff)(PyvirVoidPtr_Get(opaque)); retval = 0; }
This is exactly what one shoud not be doing according to documentation [1]:
If the opaque user data requires free'ing when the handle is unregistered, then a 2nd callback can be supplied for this purpose. This callback needs to be invoked from a clean stack. If 'ff' callbacks are invoked directly from the virEventRemoveHandleFunc they will likely deadlock in libvirt.
[1] https://libvirt.org/html/libvirt-libvirt-event.html#virEventAddHandleFunc
(snip)
Yes, your analysis looks correct to me. I don't think it is easy to fix this in the libvirt-override.c file really, unless we were to spawn a throw away thread to call "ff" in. The better approach would be for the user suppied event loop impl of removeHandle to arrange to call 'ff' themselves at the correct time.
IIUC, this cannot be done without involving user-supplied event loop at all. Unless I miss something (I'm not a C programmer really), that's the implication of "This callback needs to be invoked from a clean stack", since libvirt-override.c has access to "clean stack" only via whatever is implemented in python. That unfortunately means the python API (specifically the expected addHandle signature) should be amended to include explicit ff callback to be called directly from the loop after removeHandleFunc returns. That's not as bad as it sounds, because the argument can be optional in Python, so new Python code could be compatible with both old and new libvirt-python, and libvirt-override.c could catch TypeError exception to retry the function with different signature for existing code. That would still work not worse than present, since it will still leak memory. In any case, the ff callback should be dropped from libvirt_virEventRemoveHandleFunc(), because again, I suppose all currently running implementations leak like a sieve. I have no idea how to write that in C. -- pozdrawiam / best regards _.-._ Wojtek Porczyk .-^' '^-. Invisible Things Lab |'-.-^-.-'| | | | | I do not fear computers, | '-.-' | I fear lack of them. '-._ : ,-' -- Isaac Asimov `^-^-_>

On Thu, Jan 19, 2017 at 01:18:12PM +0100, Wojtek Porczyk wrote:
On Thu, Jan 19, 2017 at 11:56:36AM +0000, Daniel P. Berrange wrote:
On Thu, Jan 19, 2017 at 12:48:11PM +0100, Wojtek Porczyk wrote:
Hello libvirt-list, (snip)
} else { opaque = PyTuple_GetItem(result, 1); ff = PyTuple_GetItem(result, 2); cff = PyvirFreeCallback_Get(ff); if (cff) (*cff)(PyvirVoidPtr_Get(opaque)); retval = 0; }
This is exactly what one shoud not be doing according to documentation [1]:
If the opaque user data requires free'ing when the handle is unregistered, then a 2nd callback can be supplied for this purpose. This callback needs to be invoked from a clean stack. If 'ff' callbacks are invoked directly from the virEventRemoveHandleFunc they will likely deadlock in libvirt.
[1] https://libvirt.org/html/libvirt-libvirt-event.html#virEventAddHandleFunc
(snip)
Yes, your analysis looks correct to me. I don't think it is easy to fix this in the libvirt-override.c file really, unless we were to spawn a throw away thread to call "ff" in. The better approach would be for the user suppied event loop impl of removeHandle to arrange to call 'ff' themselves at the correct time.
IIUC, this cannot be done without involving user-supplied event loop at all. Unless I miss something (I'm not a C programmer really), that's the implication of "This callback needs to be invoked from a clean stack", since libvirt-override.c has access to "clean stack" only via whatever is implemented in python.
That unfortunately means the python API (specifically the expected addHandle signature) should be amended to include explicit ff callback to be called directly from the loop after removeHandleFunc returns. That's not as bad as it sounds, because the argument can be optional in Python, so new Python code could be compatible with both old and new libvirt-python, and libvirt-override.c could catch TypeError exception to retry the function with different signature for existing code. That would still work not worse than present, since it will still leak memory. In any case, the ff callback should be dropped from libvirt_virEventRemoveHandleFunc(), because again, I suppose all currently running implementations leak like a sieve.
I have no idea how to write that in C.
Would you mind creating a bug in our bugzilla [1] just so we don't lose track of this issue? Thank you very much in advance. Martin [1] https://bugzilla.redhat.com
-- pozdrawiam / best regards _.-._ Wojtek Porczyk .-^' '^-. Invisible Things Lab |'-.-^-.-'| | | | | I do not fear computers, | '-.-' | I fear lack of them. '-._ : ,-' -- Isaac Asimov `^-^-_>
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Fri, Feb 03, 2017 at 02:04:57PM +0100, Martin Kletzander wrote:
Would you mind creating a bug in our bugzilla [1] just so we don't lose track of this issue? Thank you very much in advance.
Sorry for the delay. Here it is: https://bugzilla.redhat.com/show_bug.cgi?id=1433028 -- pozdrawiam / best regards _.-._ Wojtek Porczyk .-^' '^-. Invisible Things Lab |'-.-^-.-'| | | | | I do not fear computers, | '-.-' | I fear lack of them. '-._ : ,-' -- Isaac Asimov `^-^-_>
participants (3)
-
Daniel P. Berrange
-
Martin Kletzander
-
Wojtek Porczyk