Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD
flexibility"):
Ok, thanks. I'm currently testing on your git branch referenced
earlier
in this thread
git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1
Great. That's the one. My current version is pretty much identical -
some unused variables deleted and comments edited.
> * You need to fix the timer deregistration arrangements in the
> libvirt/libxl driver to avoid the crash you identified the other day.
Yes, I'm testing a fix now.
Great.
> * Something needs to be done about the 20ms slop in the libvirt
event
> loop (as it could cause libxl to lock up). If you can't get rid of
> it in the libvirt core, then adding 20ms to the every requested
> callback time in the libvirt/libxl driver would work for now.
>
The commit msg adding the fuzz says
Fix event test timer checks on kernels with HZ=100
On kernels with HZ=100, the resolution of sleeps in poll() is
quite bad. Doing a precise check on the expiry time vs the
current time will thus often thing the timer has not expired
even though we're within 10ms of the expected expiry time. This
then causes another pointless sleep in poll() for <10ms. Timers
do not need to have such precise expiration, so we treat a timer
as expired if it is within 20ms of the expected expiry time. This
also fixes the eventtest.c test suite on kernels with HZ=100
I think this is a bug in the kernel. poll() may sleep longer, but not
shorter, than expected.
* daemon/event.c: Add 20ms fuzz when checking for timer expiry
I could handle this in the libxl driver as you say, but doing so makes
me a bit nervous. Potentially locking up libxl makes me nervous too :).
I was going to say that the code in libxl_osevent_occurred_timeout
checked the time against the requested time and would ignore the event
(thinking it was stale) if it was too early.
But in fact now that I read the code this is not true. In fact I
think it will work OK (modulo some things happening too soon). So the
upshot is that I still think this is a bug in libvirt but I don't
think it's critical to fix it.
Sorry to cause undue alarm.
Yes. I've been running my tests for about 24 hours now with no
problems
noted. The tests include starting/stopping a persistent VM,
creating/stopping a transient VM, rebooting a persistent VM,
saving/restoring a transient VM, and getting info on all of these VMs.
I should probably add saving/restoring a persistent VM to the mix since
the associated libxl_ctx is never freed. Only when a persistent VM is
undefined is the libxl_ctx freed.
Right. Great.
Thanks,
Ian.