[Adding libvirt list...]
Ian Jackson wrote:
Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork:
SIGCHLD flexibility"):
> BTW, I only see the crash when the save/restore script is running. I
> stopped the other scripts and domains, running only save/restore on a
> single domain, and see the crash rather quickly (within 10 iterations).
>
I'll look at the libvirt code, but:
With a recurring timeout, how can you ever know it's cancelled ?
There might be threads out there, which don't hold any locks, which
are in the process of executing a callback for a timeout. That might
be arbitrarily delayed from the pov of the rest of the program.
E.g.:
Thread A Thread B
invoke some libxl operation
X do some libxl stuff
X register timeout (libxl)
XV record timeout info
X do some more libxl stuff
...
X do some more libxl stuff
X deregister timeout (libxl internal)
X converted to request immediate timeout
XV record new timeout info
X release libvirt event loop lock
entering libvirt event loop
V observe timeout is immediate
V need to do callback
call libxl driver
entering libvirt event loop
V observe timeout is immediate
V need to do callback
call libxl driver
call libxl
X libxl sees timeout is live
X libxl does libxl stuff
libxl driver deregisters
V record lack of timeout
free driver's timeout struct
call libxl
X libxl sees timeout is dead
X libxl does nothing
libxl driver deregisters
V CRASH due to deregistering
V already-deregistered timeout
If this is how things are, then I think there is no sane way to use
libvirt's timeouts (!)
Looking at the libvirt code again, it seems a single thread services the
event loop. See virNetServerRun() in src/util/virnetserver.c. Indeed, I
see the same thread ID in all the timer and fd callbacks. One of the
libvirt core devs can correct me if I'm wrong.
Regards,
Jim
In principle I guess the driver could keep its per-timeout structs
around forever and remember whether they've been deregistered or not.
Each one would have to have a lock in it.
But if you think about it, if you have 10 threads all running the
event loop and you set a timeout to zero, doesn't that mean that every
thread's event loop should do the timeout callback as fast as it can ?
That could be a lot of wasted effort.
The best solution would appear to be to provide a non-recurring
callback.
> I'm not so thrilled with the timeout handling code in the libvirt libxl
> driver. The driver maintains a list of active timeouts because IIRC,
> there were cases when the driver received timeout deregistrations when
> calling libxl_ctx_free, at which point some of the associated structures
> were freed. The idea was to call libxl_osevent_occurred_timeout on any
> active timeouts before freeing libxlDomainObjPrivate and its contents.
>
libxl does deregister fd callbacks in libxl_ctx_free.
But libxl doesn't currently "deregister" any timeouts in
libxl_ctx_free; indeed it would be a bit daft for it to do so as at
libxl_ctx_free there are no aos running so there would be nothing to
time out.
But there is a difficulty with timeouts which libxl has set to occur
immediately but which have not yet actually had the callback. The the
application cannot call libxl_ctx_free with such timeouts outstanding,
because that would imply later calling back into libxl with a stale
ctx.
(Looking at the code I see that the "nexi" are never actually freed.
Bah.)
Thanks,
Ian.