
Quoting Daniel P. Berrange (berrange@redhat.com):
On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote:
Nova (openstack) calls libvirt to create a container, then periodically checks using GetInfo to see whether the container is up. If it does this too quickly, then libvirt returns an error, which in libvirt.py causes an exception to be raised, the same type as if the container was bad.
lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of its execution. It checks for virDomainObjIsActive() before trying to use the cgroups.
lxcDomainStart(), holds the mutex on 'dom' for the duration of its execution, and does not return until the container is running and cgroups are present.
Yup, now that you mention it, I do see that. So this shouldn't be happening. Can't explain it, but copious fprintf debugging still suggests it is :) Is it possible that vm->def->id is not being set to -1 when it is first defined, and I'm catching it between define and start? I would think that would show up as much more broken, though I'm not seeing where vm->def->id gets set to -1 during domain definition. Well, I'll keep digging then. Thanks for setting me straight on the mutex!
Similarly when we delete the cgroups, we again hold the lock on 'dom'.
Thus any time viDomainObjIsActive() returns true, AFAICT, we have guaranteed that the cgroup does in fact exist.
So can't see how control gets to the 'else' part of this condition if the cgroups don't exist like you describe.
if (!virDomainObjIsActive(vm) || driver->cgroup == NULL) { info->cpuTime = 0; info->memory = vm->def->mem.cur_balloon; } else { if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0) != 0) { lxcError(VIR_ERR_INTERNAL_ERROR, _("Unable to get cgroup for %s"), vm->def->name); goto cleanup; }
What libvirt version were you seeing this behaviour with ?
0.9.2 thanks, -serge