On Fri, Sep 30, 2011 at 11:00:53AM -0500, Serge Hallyn wrote:
Quoting Daniel P. Berrange (berrange(a)redhat.com):
> On Thu, Sep 29, 2011 at 10:12:17PM -0500, Serge E. Hallyn wrote:
> > Quoting Daniel P. Berrange (berrange(a)redhat.com):
> > > On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote:
> > > > Nova (openstack) calls libvirt to create a container, then
> > > > periodically checks using GetInfo to see whether the container
> > > > is up. If it does this too quickly, then libvirt returns an
> > > > error, which in libvirt.py causes an exception to be raised,
> > > > the same type as if the container was bad.
> > > lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of
> > > its execution. It checks for virDomainObjIsActive() before
> > > trying to use the cgroups.
> >
> > Yes, it does, but
> >
> > > lxcDomainStart(), holds the mutex on 'dom' for the duration of
> > > its execution, and does not return until the container is running
> > > and cgroups are present.
> >
> > No. It calls the lxc_controller with --background. The controller
> > main task in turn exits before the cgroups have been set up. There
> > is the race.
>
> The lxcDomainStart() method isn't actually waiting on the child
> pid directly, so the --background flag ought not to matter. We
> have a pipe that we pass into the controller, which we wait on
> for a notification after running the process. The controller
> does not notify the 'handshake' FD until after cgroups have
> been setup, unless I'm mis-interpreting our code
That's the call to lxcContainerWaitForContinue(), right? If so, that's
done by lxcContainerChild(), which is called by the lxc_controller.
AFAICS there is nothing in the lxc_driver which will wait on that
before dropping the driver->lock mutex.
In lxcVmStart(), which runs while driver->lock is held we have the
following section of code in play:
....
if (virCommandRun(cmd, NULL) < 0)
goto cleanup;
if (VIR_CLOSE(handshakefds[1]) < 0) {
virReportSystemError(errno, "%s", _("could not close handshake
fd"));
goto cleanup;
}
/* Connect to the controller as a client *first* because
* this will block until the child has written their
* pid file out to disk */
if ((priv->monitor = lxcMonitorClient(driver, vm)) < 0)
goto cleanup;
/* And get its pid */
if ((r = virPidFileRead(driver->stateDir, vm->def->name, &vm->pid))
< 0) {
virReportSystemError(-r,
_("Failed to read pid file %s/%s.pid"),
driver->stateDir, vm->def->name);
goto cleanup;
}
vm->def->id = vm->pid;
virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason);
if (lxcContainerWaitForContinue(handshakefds[0]) < 0) {
....
The 'virCommandRun' is where libvirt_lxc controller is forked (in the
background). The main libvirt LXC driver code then blocks on this
'lxcContainerWaitForContinue(handshakefds[0]) line, for the controller
to finish initializing.
The LXC controller 'main' method received the handshake FD and invokes
lxcControllerRun(). This method does various setup tasks, in particular
the following:
....
if (lxcSetContainerResources(def) < 0)
goto cleanup;
...
if ((container = lxcContainerStart(def,
nveths,
veths,
control[1],
containerhandshake[1],
containerPtyPath)) < 0)
goto cleanup;
VIR_FORCE_CLOSE(control[1]);
VIR_FORCE_CLOSE(containerhandshake[1]);
if (lxcControllerMoveInterfaces(nveths, veths, container) < 0)
goto cleanup;
if (lxcContainerSendContinue(control[0]) < 0) {
virReportSystemError(errno, "%s",
_("Unable to send container continue message"));
goto cleanup;
}
if (lxcContainerWaitForContinue(containerhandshake[0]) < 0) {
virReportSystemError(errno, "%s",
_("error receiving signal from container"));
goto cleanup;
}
/* Now the container is fully setup... */
....
/* ...and reduce our privileges */
if (lxcControllerClearCapabilities() < 0)
goto cleanup;
if (lxcContainerSendContinue(handshakefd) < 0) {
virReportSystemError(errno, "%s",
_("error sending continue signal to parent"));
goto cleanup;
}
VIR_FORCE_CLOSE(handshakefd);
lxcSetContainerResources() is what creates the cgroups. The very last
thing we do, once everything is configured and the 'init' process is
finally running, is to notify the handshake FD
'lxcContainerSendContinue(handshakefd)'
which finally allows the libvirtd LXC code in 'lxcVmStart' to continue
running and return to the client. Only at that point is it possible for
other API calls to be made which touch the cgroups
I'm trying to reproduce the kind of race condition problem scenario you
describe using the following sequence of commands:
# virsh -c lxc:/// start helloworld ; virsh -c lxc:/// dominfo helloworld ; virsh -c
lxc:/// destroy helloworld
Domain helloworld started
Id: 7015
Name: helloworld
UUID: a099376e-a803-ca94-f99c-d9a8f9a30088
OS Type: exe
State: running
CPU(s): 1
CPU time: 0.0s
Max memory: 102400 kB
Used memory: 280 kB
Persistent: yes
Autostart: disable
Managed save: unknown
Domain helloworld destroyed
Even if I add a 'sleep(10)' statement as the first line of the
lxcSetContainerResources() method, I can't seem to trigger any race
wrt cgroup creation & virsh dominfo.
Is there a better test case you can show me to reproduce what you're
seeing ?
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|