Re: [libvirt] [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility

[Adding libvirt list...] Ian Jackson wrote:
Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
BTW, I only see the crash when the save/restore script is running. I stopped the other scripts and domains, running only save/restore on a single domain, and see the crash rather quickly (within 10 iterations).
I'll look at the libvirt code, but:
With a recurring timeout, how can you ever know it's cancelled ? There might be threads out there, which don't hold any locks, which are in the process of executing a callback for a timeout. That might be arbitrarily delayed from the pov of the rest of the program.
E.g.:
Thread A Thread B
invoke some libxl operation X do some libxl stuff X register timeout (libxl) XV record timeout info X do some more libxl stuff ... X do some more libxl stuff X deregister timeout (libxl internal) X converted to request immediate timeout XV record new timeout info X release libvirt event loop lock entering libvirt event loop V observe timeout is immediate V need to do callback call libxl driver
entering libvirt event loop V observe timeout is immediate V need to do callback call libxl driver call libxl X libxl sees timeout is live X libxl does libxl stuff libxl driver deregisters V record lack of timeout free driver's timeout struct call libxl X libxl sees timeout is dead X libxl does nothing libxl driver deregisters V CRASH due to deregistering V already-deregistered timeout
If this is how things are, then I think there is no sane way to use libvirt's timeouts (!)
Looking at the libvirt code again, it seems a single thread services the event loop. See virNetServerRun() in src/util/virnetserver.c. Indeed, I see the same thread ID in all the timer and fd callbacks. One of the libvirt core devs can correct me if I'm wrong. Regards, Jim
In principle I guess the driver could keep its per-timeout structs around forever and remember whether they've been deregistered or not. Each one would have to have a lock in it.
But if you think about it, if you have 10 threads all running the event loop and you set a timeout to zero, doesn't that mean that every thread's event loop should do the timeout callback as fast as it can ? That could be a lot of wasted effort.
The best solution would appear to be to provide a non-recurring callback.
I'm not so thrilled with the timeout handling code in the libvirt libxl driver. The driver maintains a list of active timeouts because IIRC, there were cases when the driver received timeout deregistrations when calling libxl_ctx_free, at which point some of the associated structures were freed. The idea was to call libxl_osevent_occurred_timeout on any active timeouts before freeing libxlDomainObjPrivate and its contents.
libxl does deregister fd callbacks in libxl_ctx_free.
But libxl doesn't currently "deregister" any timeouts in libxl_ctx_free; indeed it would be a bit daft for it to do so as at libxl_ctx_free there are no aos running so there would be nothing to time out.
But there is a difficulty with timeouts which libxl has set to occur immediately but which have not yet actually had the callback. The the application cannot call libxl_ctx_free with such timeouts outstanding, because that would imply later calling back into libxl with a stale ctx.
(Looking at the code I see that the "nexi" are never actually freed. Bah.)
Thanks, Ian.

On Mon, Jan 27, 2014 at 06:39:36PM -0700, Jim Fehlig wrote:
[Adding libvirt list...]
Ian Jackson wrote:
Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
BTW, I only see the crash when the save/restore script is running. I stopped the other scripts and domains, running only save/restore on a single domain, and see the crash rather quickly (within 10 iterations).
I'll look at the libvirt code, but:
With a recurring timeout, how can you ever know it's cancelled ? There might be threads out there, which don't hold any locks, which are in the process of executing a callback for a timeout. That might be arbitrarily delayed from the pov of the rest of the program.
E.g.:
Thread A Thread B
invoke some libxl operation X do some libxl stuff X register timeout (libxl) XV record timeout info X do some more libxl stuff ... X do some more libxl stuff X deregister timeout (libxl internal) X converted to request immediate timeout XV record new timeout info X release libvirt event loop lock entering libvirt event loop V observe timeout is immediate V need to do callback call libxl driver
entering libvirt event loop V observe timeout is immediate V need to do callback call libxl driver call libxl X libxl sees timeout is live X libxl does libxl stuff libxl driver deregisters V record lack of timeout free driver's timeout struct call libxl X libxl sees timeout is dead X libxl does nothing libxl driver deregisters V CRASH due to deregistering V already-deregistered timeout
If this is how things are, then I think there is no sane way to use libvirt's timeouts (!)
Looking at the libvirt code again, it seems a single thread services the event loop. See virNetServerRun() in src/util/virnetserver.c. Indeed, I see the same thread ID in all the timer and fd callbacks. One of the libvirt core devs can correct me if I'm wrong.
Yes, you are correct. The threading model for libvirtd is that the process thread leader executes the event loop, dealing with timer and file descriptor I/O callbacks. There are also 'n' worker threads which exclusively handle public API calls from libvirt clients. IOW all your timer callbacks will be in one thread - which also means you want your timer callbacks to be fast to execute. If you have any very slow code you'll want to spawn a temporary worker thread from the timer callback to do the slow work. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Daniel P. Berrange writes ("Re: [libvirt] [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
Yes, you are correct. The threading model for libvirtd is that the process thread leader executes the event loop, dealing with timer and file descriptor I/O callbacks. There are also 'n' worker threads which exclusively handle public API calls from libvirt clients. IOW all your timer callbacks will be in one thread - which also means you want your timer callbacks to be fast to execute.
Right. Good. All of libxl's callbacks should be fast. There is still the problem that libxl functions on other threads may make an fd deregistration call, while libvirt's event loop is still polling (or selecting) on the fd in the main loop. libxl might then close the fd, dup2 something onto it, leave the fd to be reused for some other object. Depending on the underlying kernel, that can cause side effects. I have reproduced the analogous bug with libxl's event loop, but I had to use an fd connected to the controlling tty, from a background process group, when the tty has stty tostop. polling such a thing for POLLOUT raises SIGTTOU. Of course the libvirt xl driver still needs to cope with being told by libxl to register or modify a timeout, or register deregister or modify an fd, in other than the master thread. Ian.

On Mon, Jan 27, 2014 at 06:39:36PM -0700, Jim Fehlig wrote:
[Adding libvirt list...]
Ian Jackson wrote:
Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
BTW, I meant to ask before - what is the SIGCHLD reference about in the subject line ? libvirt drivers that live inside libvirtd should never use or rely on the SIGCHLD signal at all. All VM processes started by libvirtd ought to be fully daemonized so that their parent is pid 1 / init. This ensures that the libvirtd daemon can be restarted without all the VMs getting reaped. Once the VMs are reparented to init, then libvirt or library code it uses has no way of ever receiving SIGCHLD. I'm unclear if the libvirt libxl driver is currently daemonizing the processes associated with VMs it starts, but if not, it really should do. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Daniel P. Berrange wrote:
On Mon, Jan 27, 2014 at 06:39:36PM -0700, Jim Fehlig wrote:
[Adding libvirt list...]
Ian Jackson wrote:
Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
BTW, I meant to ask before - what is the SIGCHLD reference about in the subject line ?
This is related to child processes started by libxl. E.g. running a bootloader when creating PV VMs, running a save/restore helper when saving/restoring a VM, etc.
libvirt drivers that live inside libvirtd should never use or rely on the SIGCHLD signal at all. All VM processes started by libvirtd ought to be fully daemonized so that their parent is pid 1 / init. This ensures that the libvirtd daemon can be restarted without all the VMs getting reaped. Once the VMs are reparented to init, then libvirt or library code it uses has no way of ever receiving SIGCHLD.
Nod. VMs are "daemonized", in the context of Xen. Once running, libvirtd can be restarted without reaping them. Regards, Jim

On Thu, Jan 30, 2014 at 09:14:01AM -0700, Jim Fehlig wrote:
Daniel P. Berrange wrote:
On Mon, Jan 27, 2014 at 06:39:36PM -0700, Jim Fehlig wrote:
[Adding libvirt list...]
Ian Jackson wrote:
Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
BTW, I meant to ask before - what is the SIGCHLD reference about in the subject line ?
This is related to child processes started by libxl. E.g. running a bootloader when creating PV VMs, running a save/restore helper when saving/restoring a VM, etc.
libvirt drivers that live inside libvirtd should never use or rely on the SIGCHLD signal at all. All VM processes started by libvirtd ought to be fully daemonized so that their parent is pid 1 / init. This ensures that the libvirtd daemon can be restarted without all the VMs getting reaped. Once the VMs are reparented to init, then libvirt or library code it uses has no way of ever receiving SIGCHLD.
Nod. VMs are "daemonized", in the context of Xen. Once running, libvirtd can be restarted without reaping them.
Great, thanks for clarifying, this sounds fine then. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
Looking at the libvirt code again, it seems a single thread services the event loop. See virNetServerRun() in src/util/virnetserver.c. Indeed, I see the same thread ID in all the timer and fd callbacks. One of the libvirt core devs can correct me if I'm wrong.
OK. So just to recap where we stand: * I think libxl needs the SIGCHLD flexibility series. I'll repost that (v3) but it's had hardly any changes. * You need to fix the timer deregistration arrangements in the libvirt/libxl driver to avoid the crash you identified the other day. * Something needs to be done about the 20ms slop in the libvirt event loop (as it could cause libxl to lock up). If you can't get rid of it in the libvirt core, then adding 20ms to the every requested callback time in the libvirt/libxl driver would work for now. * I think we can get away with not doing anything about the fd deregistration race in libvirt because both Linux and FreeBSD have behaviours that are tolerable. * libxl should have the fd deregistration race fixed in Xen 4.5. Have you managed to fix the timer deregistration crash, and retest ? Thanks, Ian.

Ian Jackson wrote:
Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
Looking at the libvirt code again, it seems a single thread services the event loop. See virNetServerRun() in src/util/virnetserver.c. Indeed, I see the same thread ID in all the timer and fd callbacks. One of the libvirt core devs can correct me if I'm wrong.
OK. So just to recap where we stand:
* I think libxl needs the SIGCHLD flexibility series. I'll repost that (v3) but it's had hardly any changes.
Ok, thanks. I'm currently testing on your git branch referenced earlier in this thread git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1
* You need to fix the timer deregistration arrangements in the libvirt/libxl driver to avoid the crash you identified the other day.
Yes, I'm testing a fix now.
* Something needs to be done about the 20ms slop in the libvirt event loop (as it could cause libxl to lock up). If you can't get rid of it in the libvirt core, then adding 20ms to the every requested callback time in the libvirt/libxl driver would work for now.
The commit msg adding the fuzz says Fix event test timer checks on kernels with HZ=100 On kernels with HZ=100, the resolution of sleeps in poll() is quite bad. Doing a precise check on the expiry time vs the current time will thus often thing the timer has not expired even though we're within 10ms of the expected expiry time. This then causes another pointless sleep in poll() for <10ms. Timers do not need to have such precise expiration, so we treat a timer as expired if it is within 20ms of the expected expiry time. This also fixes the eventtest.c test suite on kernels with HZ=100 * daemon/event.c: Add 20ms fuzz when checking for timer expiry I could handle this in the libxl driver as you say, but doing so makes me a bit nervous. Potentially locking up libxl makes me nervous too :).
* I think we can get away with not doing anything about the fd deregistration race in libvirt because both Linux and FreeBSD have behaviours that are tolerable.
* libxl should have the fd deregistration race fixed in Xen 4.5.
Have you managed to fix the timer deregistration crash, and retest ?
Yes. I've been running my tests for about 24 hours now with no problems noted. The tests include starting/stopping a persistent VM, creating/stopping a transient VM, rebooting a persistent VM, saving/restoring a transient VM, and getting info on all of these VMs. I should probably add saving/restoring a persistent VM to the mix since the associated libxl_ctx is never freed. Only when a persistent VM is undefined is the libxl_ctx freed. Regards, Jim

Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"):
Ok, thanks. I'm currently testing on your git branch referenced earlier in this thread
git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1
Great. That's the one. My current version is pretty much identical - some unused variables deleted and comments edited.
* You need to fix the timer deregistration arrangements in the libvirt/libxl driver to avoid the crash you identified the other day.
Yes, I'm testing a fix now.
Great.
* Something needs to be done about the 20ms slop in the libvirt event loop (as it could cause libxl to lock up). If you can't get rid of it in the libvirt core, then adding 20ms to the every requested callback time in the libvirt/libxl driver would work for now.
The commit msg adding the fuzz says
Fix event test timer checks on kernels with HZ=100
On kernels with HZ=100, the resolution of sleeps in poll() is quite bad. Doing a precise check on the expiry time vs the current time will thus often thing the timer has not expired even though we're within 10ms of the expected expiry time. This then causes another pointless sleep in poll() for <10ms. Timers do not need to have such precise expiration, so we treat a timer as expired if it is within 20ms of the expected expiry time. This also fixes the eventtest.c test suite on kernels with HZ=100
I think this is a bug in the kernel. poll() may sleep longer, but not shorter, than expected.
* daemon/event.c: Add 20ms fuzz when checking for timer expiry
I could handle this in the libxl driver as you say, but doing so makes me a bit nervous. Potentially locking up libxl makes me nervous too :).
I was going to say that the code in libxl_osevent_occurred_timeout checked the time against the requested time and would ignore the event (thinking it was stale) if it was too early. But in fact now that I read the code this is not true. In fact I think it will work OK (modulo some things happening too soon). So the upshot is that I still think this is a bug in libvirt but I don't think it's critical to fix it. Sorry to cause undue alarm.
Yes. I've been running my tests for about 24 hours now with no problems noted. The tests include starting/stopping a persistent VM, creating/stopping a transient VM, rebooting a persistent VM, saving/restoring a transient VM, and getting info on all of these VMs.
I should probably add saving/restoring a persistent VM to the mix since the associated libxl_ctx is never freed. Only when a persistent VM is undefined is the libxl_ctx freed.
Right. Great. Thanks, Ian.
participants (3)
-
Daniel P. Berrange
-
Ian Jackson
-
Jim Fehlig