Quoting Daniel P. Berrange (berrange(a)redhat.com):
On Thu, Sep 29, 2011 at 10:12:17PM -0500, Serge E. Hallyn wrote:
> Quoting Daniel P. Berrange (berrange(a)redhat.com):
> > On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote:
> > > Nova (openstack) calls libvirt to create a container, then
> > > periodically checks using GetInfo to see whether the container
> > > is up. If it does this too quickly, then libvirt returns an
> > > error, which in libvirt.py causes an exception to be raised,
> > > the same type as if the container was bad.
> > lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of
> > its execution. It checks for virDomainObjIsActive() before
> > trying to use the cgroups.
>
> Yes, it does, but
>
> > lxcDomainStart(), holds the mutex on 'dom' for the duration of
> > its execution, and does not return until the container is running
> > and cgroups are present.
>
> No. It calls the lxc_controller with --background. The controller
> main task in turn exits before the cgroups have been set up. There
> is the race.
The lxcDomainStart() method isn't actually waiting on the child
pid directly, so the --background flag ought not to matter. We
have a pipe that we pass into the controller, which we wait on
for a notification after running the process. The controller
does not notify the 'handshake' FD until after cgroups have
been setup, unless I'm mis-interpreting our code
That's the call to lxcContainerWaitForContinue(), right? If so, that's
done by lxcContainerChild(), which is called by the lxc_controller.
AFAICS there is nothing in the lxc_driver which will wait on that
before dropping the driver->lock mutex.
-serge