On 06/23/2017 05:39 AM, Michal Privoznik wrote:
> On 06/23/2017 12:10 AM, John Ferlan wrote:
>> If a remote call fails during event registration (more than likely from
>> a network failure or remote libvirtd restart timed just right), then when
>> calling the virObjectEventStateDeregisterID we don't want to call the
>> registered @freecb function because that breaks our contract that we
>> would only call it after succesfully returning. If the @freecb routine
>> were called, it could result in a double free from properly coded
>> applications that free their opaque data on failure to register, as seen
>> in the following details:
>>
>> Program terminated with signal 6, Aborted.
>> #0 0x00007fc45cba15d7 in raise
>> #1 0x00007fc45cba2cc8 in abort
>> #2 0x00007fc45cbe12f7 in __libc_message
>> #3 0x00007fc45cbe86d3 in _int_free
>> #4 0x00007fc45d8d292c in PyDict_Fini
>> #5 0x00007fc45d94f46a in Py_Finalize
>> #6 0x00007fc45d960735 in Py_Main
>> #7 0x00007fc45cb8daf5 in __libc_start_main
>> #8 0x0000000000400721 in _start
>>
>> The double dereference of 'pyobj_cbData' is triggered in the following
way:
>>
>> (1) libvirt_virConnectDomainEventRegisterAny is invoked.
>> (2) the event is successfully added to the event callback list
>> (virDomainEventStateRegisterClient in
>> remoteConnectDomainEventRegisterAny returns 1 which means ok).
>> (3) when function remoteConnectDomainEventRegisterAny is hit,
>> network connection disconnected coincidently (or libvirtd is
>> restarted) in the context of function 'call' then the connection
>> is lost and the function 'call' failed, the branch
>> virObjectEventStateDeregisterID is therefore taken.
>> (4) 'pyobj_conn' is dereferenced the 1st time in
>> libvirt_virConnectDomainEventFreeFunc.
>> (5) 'pyobj_cbData' (refered to pyobj_conn) is dereferenced the
>> 2nd time in libvirt_virConnectDomainEventRegisterAny.
>> (6) the double free error is triggered.
>>
>> Resolve this by adding an @inhibitFreeCb boolean in order to avoid calling
>> freecb in virObjectEventStateDeregisterID for any remote call failure in
>> a remoteConnect*EventRegister* API. For remoteConnect*EventDeregister* calls,
>> the passed value would be false indicating they should run the freecb if it
>> exists.
>>
>> Patch based on the investigation and initial patch posted by:
>> fangying <fangying1(a)huawei.com>.
>>
>> Signed-off-by: John Ferlan <jferlan(a)redhat.com>
>> ---
>> Initial patch found at:
>>
>>
https://www.redhat.com/archives/libvir-list/2017-June/msg00039.html
>>
>> based on feedback from Daniel Berrange to a previous posting:
>>
>>
https://www.redhat.com/archives/libvir-list/2017-May/msg01089.html
>>
>> Since no new patch was posted, I posted my idea from review for
>> consideration.
>>
>> src/bhyve/bhyve_driver.c | 2 +-
>> src/conf/domain_event.c | 2 +-
>> src/conf/object_event.c | 23 +++++++++++++++++------
>> src/conf/object_event.h | 3 ++-
>> src/libxl/libxl_driver.c | 2 +-
>> src/lxc/lxc_driver.c | 2 +-
>> src/network/bridge_driver.c | 2 +-
>> src/node_device/node_device_driver.c | 2 +-
>> src/qemu/qemu_driver.c | 4 ++--
>> src/remote/remote_driver.c | 32 ++++++++++++++++----------------
>> src/secret/secret_driver.c | 2 +-
>> src/storage/storage_driver.c | 2 +-
>> src/test/test_driver.c | 8 ++++----
>> src/uml/uml_driver.c | 2 +-
>> src/vz/vz_driver.c | 2 +-
>> src/xen/xen_driver.c | 2 +-
>> 16 files changed, 52 insertions(+), 40 deletions(-)
>>
>> diff --git a/src/bhyve/bhyve_driver.c b/src/bhyve/bhyve_driver.c
>> index ed2221a..6722dc4 100644
>> --- a/src/bhyve/bhyve_driver.c
>> +++ b/src/bhyve/bhyve_driver.c
>> @@ -1503,7 +1503,7 @@ bhyveConnectDomainEventDeregisterAny(virConnectPtr conn,
>>
>> if (virObjectEventStateDeregisterID(conn,
>> privconn->domainEventState,
>> - callbackID) < 0)
>> + callbackID, false) < 0)
>> return -1;
>>
>> return 0;
>> diff --git a/src/conf/domain_event.c b/src/conf/domain_event.c
>> index 6e471d7..ff4c602 100644
>> --- a/src/conf/domain_event.c
>> +++ b/src/conf/domain_event.c
>> @@ -2301,7 +2301,7 @@ virDomainEventStateDeregister(virConnectPtr conn,
>> NULL);
>> if (callbackID < 0)
>> return -1;
>> - return virObjectEventStateDeregisterID(conn, state, callbackID);
>> + return virObjectEventStateDeregisterID(conn, state, callbackID, false);
>
> Why is this false? virDomainEventStateDeregister is called directly from
> public API implementations. For instance from
> qemuConnectDomainEventDeregister(). Don't we want to run free callback then?
>
and the eventual check in virObjectEventCallbackListRemoveID is:
/* inhibiting calling @freecb from error paths in register
* functions ensures the caller of the register function won't
* end up with a double free error */
if (!inhibitFreeCb && cb->freecb)
IOW: Any time called from a Deregister path, we call the free callback;
however, when called from a RPC call REGISTER path, then we don't want
to call the free callback because we're returning -1 to the caller and
the caller will free things.
Ah, the negated logic confused me. I think I'd be nicer if we've used
boolean in straight logic (= do call) instead of reversed (= don't call).
What do you think?
Michal