Hi Daniel,
On Sun, Dec 21, 2008 at 6:15 PM, Daniel P. Berrange
<berrange@redhat.com> wrote:
>
> However, I still have the crash in the dmesg output, as before, errors like:
>
>
>
> sysfs: duplicate filename '0' can not be
> created
>
> ------------[ cut here
> -------------
>
> WARNING: at fs/sysfs/dir.c:424 sysfs_add_one+0x34/0xa6()
> ...
> Pid: 2616, comm: libvirtd Tainted: P 2.6.25.20-113 #1
> ...
> *kobject_add_internal failed for 0 with -EEXIST, don't try to register
> things with the same name in the same directory.*
Any of these messages in the dmesg output are kernel problems, not
libvirt problems. The process listing you show about indicates that
libvirtd itself is running, and has not crashed.
Using eclipse and gdb, I've traced the problem until this line in source code. Source code filename lxc_container.c, line 654, this function:
cpid = clone(lxcContainerDummyChild, childStack, flags, NULL);
is crashing something inside my kernel, which results in the messages that I've sent previously in this thread. Sometimes, the crash occurs even though the "cpid" value is 0, and in the second turn (libvirtd continues to run despite the crashing messages), cpid value returns -1 and system gives debug message mentioned in this function:
"DEBUG("clone call returned %s, container support is not enabled",
strerror(errno));"
The full function causing the error, with the exact line in "bold font" is:
int lxcContainerAvailable(int features)
{
int flags = CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWUSER|
CLONE_NEWIPC|SIGCHLD;
int cpid;
char *childStack;
char *stack;
int childStatus;
if (features & LXC_CONTAINER_FEATURE_NET)
flags |= CLONE_NEWNET;
if (VIR_ALLOC_N(stack, getpagesize() * 4) < 0) {
DEBUG0("Unable to allocate stack");
return -1;
}
childStack = stack + (getpagesize() * 4);
cpid = clone(lxcContainerDummyChild, childStack, flags, NULL);
VIR_FREE(stack);
if (cpid < 0) {
DEBUG("clone call returned %s, container support is not enabled",
strerror(errno));
return -1;
} else {
waitpid(cpid, &childStatus, 0);
}
return 0;
}
I appreciate any clues on why this could happen, and what shall I change in the host kernel to prevent it from happening?
Thank you very very much.
Emre