On 11/23/2015 04:24 AM, Ian Jackson wrote:
Jim Fehlig writes ("[PATCH RFC] libxl: use libxl_event_wait to
process libxl events"):
> Prior to this patch, libxl events were delivered to libvirt via
> the libxlDomainEventHandler callback registered with libxl.
> Documenation in $xensrc/tools/libxl/libxl_event.h states that the
> callback "may occur on any thread in which the application calls
> libxl". This can result in deadlock since many of the libvirt
> callees of libxl hold a lock on the virDomainObj they are working
> on. When the callback is invoked, it attempts to find a virDomainObj
> corresponding to the domain ID provided by libxl. Searching the
> domain obj list results in locking each obj before checking if it is
> active, and its ID equals the requested ID. Deadlock is possible
> when attempting to lock an obj that is already locked further up
> the call stack. Indeed, Max Ustermann recently reported an instance
> of this deadlock
This sounds like a very plausible failure mode.
>
https://www.redhat.com/archives/libvir-list/2015-November/msg00130.html
>
> This patch moves processing of libxl events to a thread, where
> libxl_event_wait() is used to collect events. This allows processing
> libxl events asynchronously in libvirt, avoiding the deadlock.
The reasoning is sound and the remedy is IMO correct. (However, you
mean "this allows processing libxl events _synchronously_", since what
you are doing is serialising them all into your main loop.)
Ah yes, poor choice of words. The last sentence should read:
This approach gives libvirt more control over event processing, ensuring object
locking constraints can be met.
But your comment exposes a flaw in the patch, one that had already been fixed in
the event_occurs approach. Shutting down a large memory domain can take
considerable time due to memory scrubbing, meanwhile stalling reception of other
events. The patch was a bit aggressive in removing libxlDomainShutdownThread().
So:
Acked-by: Ian Jackson <ian.jackson(a)eu.citrix.com>
NB that I have not reviewed the patch in detail. I can do that if you
like, although of course my knowledge of the innards of libvirt is not
wonderful.
> The only reservations I have about this patch come from comments
> about libxl_event_wait in libxl_event.h
>
> Like libxl_event_check but blocks if no suitable events are
> available, until some are. Uses libxl_osevent_beforepoll/
> _afterpoll so may be inefficient if very many domains are being
> handled by a single program.
If this turns out to be a problem in practice, we will improve libxl's
data structures to not involve so many linear searches. (In fact I
think you are probably already calling synchronous libxl ao functions,
which have the same performance properties, although this is not
documented.)
Given what you say above I don't think there is a reasonable
alternative remedy. So you should go ahead with this patch.
Thanks for your comments and ACK'ing the change . I'll submit a V2 that retains
handling of shutdown event in a thread.
Regards,
Jim