On 04/08/2013 08:27 PM, Eric Blake wrote:
On 04/08/2013 07:04 AM, Peter Krempa wrote:
>> Aiee, perhaps a race between a thread freeing a domain object (and the
>> private data) and another thread that happened to acquire the domain
>> object pointer before it was freed? Let me verify if that is possible.
>
> Ufff. The domain objects in the qemu driver don't use reference counting
> to track the lifecycles. Thus it's (Theoretically) possible to acquire a
> lock of a domain object in one thread while another thread happens to
> free the domain object.
>
> I have a reproducer for this issue:
Thanks; I can confirm under valgrind that we have a use after free, with
all sorts of nasty heap corruption potential, after instrumenting my
source a bit more:
Once again, I'm trying to ascertain how far back this issue appears.
This time, it appears the problem is more recent. I initially suspected
commit d1c7b00b (Feb 2013, v1.0.3), since that rearranged the locks
inside virDomainObjListRemove, but even after instrumenting that
function, I was unable to get a crash; instead, I got the expected
lookup failure:
# virsh undefine fedora-local& sleep .1; virsh dominfo fedora-local
[1] 25898
Domain fedora-local has been undefined
error: failed to get domain 'fedora-local'
error: Domain not found: no domain with matching name 'fedora-local'
[1]+ Done virsh undefine fedora-local
It's too late for me now to search any more tonight, but on the bright
side, that narrows down the search, and we have at most two releases
affected (rather than all the way back to 0.10.0 on the race that
originally spawned this thread).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org