
Daniel P. Berrange wrote:
On Thu, Jan 17, 2013 at 10:29:35PM -0700, Jim Fehlig wrote:
Jim Fehlig wrote:
I've been investigating some races in the libxl driver and would like to get comments on some potential solutions.
The first race is in the fd/timeout event handling code, which maps libxl's osevent interface to libvirt's event loop interface. This mapping opens the possibility for libvirt's event loop to invoke event callbacks after libxl has already deregistered the event, potentially accessing an event object that has already been freed.
One solution to this race I've found successful is reference counting the objects associated with the events. When libxl registers an event, an object encapsulating the event is created and it's reference count is set to 1. When the event is injected into libvirt's event loop, another reference is taken on the object. When libxl deregisters the event, it's reference count is decremented. Once the event is removed from libvirt's event loop, the final reference is decremented and the object is disposed.
While rebasing this solution to use danpb's recent virObjectLockable change, I revisited using only a lock in the libxlDomainObject, acquiring the lock in the registration, deregistration, modify, and callback functions. I recall some problems with this approach before, but now find it to be sufficient. Perhaps a misplaced lock drove me to the unneeded complexity of this double lock nonsense...
This approach ensures the object is not disposed until it is removed from libvirt's event loop *and* libxl had explicitly deregistered the event. The notion of an event being 'disabled' found in libvirt's event loop impl also had to be added to the libxl event object, to ensure the driver doesn't call into libxl for a previously deregistered event.
The second race is between destroying a vm (i.e., calling privateDataFreeFunc, which frees the libxl ctx) and deregistration/cleanup of all events that have been registered by libxl.
One solution for this race is to convert libxlDomainObjPrivate to a virObject and increment its reference for each fd and timeout registration.
Only change here is using the new virObjectLockable.
I'm surprised that you need to have a separate lock for libxlDomainObjPrivate. The lifetime of that object is tied to the lifetime of the virDomainObjPtr, which already has a lock. Isn't it sufficient to do locking on the latter, whenever the libxlDomainObjPrivate need protecting ?
Yeah, I thought it would be sufficient too, but fd/timeout events can happen during vm operations where the virDomainObj is locked. E.g. during vm creation, the virDomainObj is locked but fd/timeout events occur as libxl reads/writes to xenstore, sets watches, reads a kernel/initrd from host, etc. It seems cleaner to have a lock in the libxlDomainObjPrivate for accessing libxl's event registrations. Regards, Jim