Quoting Serge E. Hallyn (serge.hallyn(a)canonical.com):
Quoting Daniel P. Berrange (berrange(a)redhat.com):
> On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote:
> > Nova (openstack) calls libvirt to create a container, then
> > periodically checks using GetInfo to see whether the container
> > is up. If it does this too quickly, then libvirt returns an
> > error, which in libvirt.py causes an exception to be raised,
> > the same type as if the container was bad.
> lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of
> its execution. It checks for virDomainObjIsActive() before
> trying to use the cgroups.
Yes, it does, but
> lxcDomainStart(), holds the mutex on 'dom' for the duration of
> its execution, and does not return until the container is running
> and cgroups are present.
No. It calls the lxc_controller with --background. The controller
main task in turn exits before the cgroups have been set up. There
is the race.
So what is the right fix here? Should the controller write out another
file when it is past the part which should be locked, and the driver
waits for that file to exist before it drops the driver mutex? If we
do that, do we risk having the driver hang when the controller has
hung?
-serge