[libvirt] Segfault in virDomainObjListSearchName when listing domains (qemu backend)

Hi, I'm seeing a crash in libvirt when trying to list all domains using virsh. Here's the backtrace: ===================== Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffeebfd710 (LWP 1691)] 0x00007ffff7411746 in __strcmp_sse42 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install openssl-1.0.0a-1.fc12.x86_64 (gdb) bt #0 0x00007ffff7411746 in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007ffff7ac9d79 in virDomainObjListSearchName (payload=0x73fdd0, name=<value optimized out>, data=0x7fffdc0008c0) at conf/domain_conf.c:367 #2 0x00007ffff7ab476e in virHashSearch (table=0x6f9c30, iter=0x7ffff7ac9d50 <virDomainObjListSearchName>, data=0x7fffdc0008c0) at util/hash.c:582 #3 0x00007ffff7ac9d33 in virDomainFindByName (doms=<value optimized out>, name=0x7fffdc0008c0 "basiccentos54image") at conf/domain_conf.c:377 #4 0x00000000004430f6 in qemudDomainLookupByName (conn=0x7fffe8000a80, name=0x7fffdc0008c0 "basiccentos54image") at qemu/qemu_driver.c:4166 #5 0x00007ffff7af95cd in virDomainLookupByName (conn=0x7fffe8000a80, name=0x7fffdc0008c0 "basiccentos54image") at libvirt.c:2169 #6 0x0000000000423e64 in remoteDispatchDomainLookupByName (server=<value optimized out>, client=<value optimized out>, conn=0x7fffe8000a80, hdr=<value optimized out>, rerr=0x7fffeebfcc70, args=<value optimized out>, ret=0x7fffeebfcbc0) at remote.c:2030 #7 0x0000000000426a91 in remoteDispatchClientCall (server=<value optimized out>, client=0x7ffff0001300, msg=0x7ffff0041570) at dispatch.c:506 #8 0x0000000000426e43 in remoteDispatchClientRequest (server=0x6e3cd0, client=0x7ffff0001300, msg=0x7ffff0041570) at dispatch.c:388 #9 0x0000000000417ed8 in qemudWorker (data=0x7ffff0000908) at libvirtd.c:1568 #10 0x00007ffff7878a3a in start_thread () from /lib64/libpthread.so.0 #11 0x00007ffff73d377d in clone () from /lib64/libc.so.6 #12 0x0000000000000000 in ?? () (gdb) ===================== This is with the newest version from git, pulled about 30 minutes ago. This happens when I try to start up one of the defined domains using either libvirt or virsh and then try to list all the defined domains in virsh using list --all. The attempt to start one of the domains already fails with the following output: ===================== virsh # start testserver-a error: Failed to start domain testserver-a error: internal error process exited while connecting to monitor: 17:21:14.760: debug : virCgroupNew:542 : New group /libvirt/qemu/testserver-a 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 0:cpu at /mnt/cgroups/cpu in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 1:cpuacct at /mnt/cgroups/cpuacct in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 3:memory at /mnt/cgroups/memory in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 4:devices at /mnt/cgroups/devices in /sysdefault 17:21:14.760: debug : virCgroupMakeGroup:484 : Make group /libvirt/qemu/testserver-a 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/cpu/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/cpuacct/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/memory/sysdefault/libvirt/qemu/testserver-a/ ===================== This happens with all the domains I have currently defined. Calling list --all before that produces no problems. Calling list --all after that always produces said crash. qemu is qemu-kvm 0.12.4, built from sources. The host system is a Fedora 12 install. Guido

On Fri, Jun 25, 2010 at 05:34:56PM +0200, Guido Winkelmann wrote:
Hi,
I'm seeing a crash in libvirt when trying to list all domains using virsh. Here's the backtrace:
===================== Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffeebfd710 (LWP 1691)] 0x00007ffff7411746 in __strcmp_sse42 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install openssl-1.0.0a-1.fc12.x86_64 (gdb) bt #0 0x00007ffff7411746 in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007ffff7ac9d79 in virDomainObjListSearchName (payload=0x73fdd0, name=<value optimized out>, data=0x7fffdc0008c0) at conf/domain_conf.c:367 #2 0x00007ffff7ab476e in virHashSearch (table=0x6f9c30, iter=0x7ffff7ac9d50 <virDomainObjListSearchName>, data=0x7fffdc0008c0) at util/hash.c:582 #3 0x00007ffff7ac9d33 in virDomainFindByName (doms=<value optimized out>, name=0x7fffdc0008c0 "basiccentos54image") at conf/domain_conf.c:377 #4 0x00000000004430f6 in qemudDomainLookupByName (conn=0x7fffe8000a80, name=0x7fffdc0008c0 "basiccentos54image") at qemu/qemu_driver.c:4166 #5 0x00007ffff7af95cd in virDomainLookupByName (conn=0x7fffe8000a80, name=0x7fffdc0008c0 "basiccentos54image") at libvirt.c:2169 #6 0x0000000000423e64 in remoteDispatchDomainLookupByName (server=<value optimized out>, client=<value optimized out>, conn=0x7fffe8000a80, hdr=<value optimized out>, rerr=0x7fffeebfcc70, args=<value optimized out>, ret=0x7fffeebfcbc0) at remote.c:2030 #7 0x0000000000426a91 in remoteDispatchClientCall (server=<value optimized out>, client=0x7ffff0001300, msg=0x7ffff0041570) at dispatch.c:506 #8 0x0000000000426e43 in remoteDispatchClientRequest (server=0x6e3cd0, client=0x7ffff0001300, msg=0x7ffff0041570) at dispatch.c:388 #9 0x0000000000417ed8 in qemudWorker (data=0x7ffff0000908) at libvirtd.c:1568 #10 0x00007ffff7878a3a in start_thread () from /lib64/libpthread.so.0 #11 0x00007ffff73d377d in clone () from /lib64/libc.so.6 #12 0x0000000000000000 in ?? () (gdb) =====================
This is with the newest version from git, pulled about 30 minutes ago.
This happens when I try to start up one of the defined domains using either libvirt or virsh and then try to list all the defined domains in virsh using list --all.
The attempt to start one of the domains already fails with the following output: ===================== virsh # start testserver-a error: Failed to start domain testserver-a error: internal error process exited while connecting to monitor:
Can you provide the XML config for this guest, and the associated /var/log/libvirt/qemu/testserver-a.log file
17:21:14.760: debug : virCgroupNew:542 : New group /libvirt/qemu/testserver-a 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 0:cpu at /mnt/cgroups/cpu in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 1:cpuacct at /mnt/cgroups/cpuacct in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 3:memory at /mnt/cgroups/memory in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 4:devices at /mnt/cgroups/devices in /sysdefault 17:21:14.760: debug : virCgroupMakeGroup:484 : Make group /libvirt/qemu/testserver-a 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/cpu/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/cpuacct/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/memory/sysdefault/libvirt/qemu/testserver-a/ =====================
This happens with all the domains I have currently defined.
Calling list --all before that produces no problems.
Calling list --all after that always produces said crash.
qemu is qemu-kvm 0.12.4, built from sources.
The host system is a Fedora 12 install.
Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Am Freitag, 25. Juni 2010 schrieb Daniel P. Berrange:
On Fri, Jun 25, 2010 at 05:34:56PM +0200, Guido Winkelmann wrote: [...]
This is with the newest version from git, pulled about 30 minutes ago.
This happens when I try to start up one of the defined domains using either libvirt or virsh and then try to list all the defined domains in virsh using list --all.
The attempt to start one of the domains already fails with the following output: ===================== virsh # start testserver-a error: Failed to start domain testserver-a error: internal error process exited while connecting to monitor:
Can you provide the XML config for this guest, and the associated /var/log/libvirt/qemu/testserver-a.log file
Sure. This is the guest's XML config: ===================== <domain type='kvm'> <name>testserver-a</name> <uuid>ce57ebe0-ea37-0353-bbe0-e23dbbd17708</uuid> <description>libvirt C-API Test</description> <memory>262144</memory> <currentMemory>262144</currentMemory> <vcpu>1</vcpu> <os> <type arch='x86_64' machine='pc-0.13'>hvm</type> <boot dev='hd'/> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <disk type='file' device='disk'> <source file='/data/testserver-a-system.img'/> <target dev='sdb' bus='scsi'/> <address type='drive' controller='0' bus='0' unit='1'/> </disk> <disk type='file' device='disk'> <source file='/data/testserver-a-data1.img'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <disk type='file' device='disk'> <source file='/data/testserver-a-data2.img'/> <target dev='vdc' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='file' device='cdrom'> <source file='/data/gentoo-install-amd64-minimal-20100408.iso'/> <target dev='hda' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='0' unit='0'/> </disk> <disk type='file' device='floppy'> <source file='/data/testserver-a_configfloppy.img'/> <target dev='fd0' bus='fdc'/> <address type='drive' controller='0' bus='0' unit='0'/> </disk> <controller type='scsi' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </controller> <controller type='fdc' index='0'/> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <interface type='bridge'> <mac address='52:54:00:84:6d:69'/> <source bridge='virbr1'/> <model type='e1000'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </interface> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' keymap='de' passwd='blabla'/> <video> <model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> </devices> </domain> ===================== (as found in /usr/local/etc/libvirt/qemu/) and here are the last few lines from the logfile (/usr/local/var/log/libvirt/qemu/testserver-a.log): ===================== LC_ALL=C PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -enable-kvm -m 256 -smp 1,sockets=1,cores=1,threads=1 -name testserver-a -uuid ce57ebe0-ea37-0353-bbe0-e23dbbd17708 -nodefaults -chardev socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver- a.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no- acpi -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x7 -drive file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 - drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 - drive file=/data/gentoo-install-amd64- minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global isa-fdc.driveA=drive-fdc0-0-0 -device e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net tap,fd=22,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 17:21:14.760: debug : virCgroupNew:542 : New group /libvirt/qemu/testserver-a 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 0:cpu at /mnt/cgroups/cpu in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 1:cpuacct at /mnt/cgroups/cpuacct in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 3:memory at /mnt/cgroups/memory in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 4:devices at /mnt/cgroups/devices in /sysdefault 17:21:14.760: debug : virCgroupMakeGroup:484 : Make group /libvirt/qemu/testserver-a 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/cpu/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/cpuacct/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/memory/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/devices/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupSetValueStr:277 : Set value '/mnt/cgroups/cpu/sysdefault/libvirt/qemu/testserver-a/tasks' to '1761' 17:21:14.770: debug : virCgroupSetValueStr:277 : Set value '/mnt/cgroups/cpuacct/sysdefault/libvirt/qemu/testserver-a/tasks' to '1761' 17:21:14.778: debug : virCgroupSetValueStr:277 : Set value '/mnt/cgroups/memory/sysdefault/libvirt/qemu/testserver-a/tasks' to '1761' 17:21:14.789: debug : virCgroupSetValueStr:277 : Set value '/mnt/cgroups/devices/sysdefault/libvirt/qemu/testserver-a/tasks' to '1761' 17:21:14.802: debug : qemudInitCpuAffinity:2227 : Setting CPU affinity 17:21:14.803: debug : qemuSecurityDACSetProcessLabel:546 : Dropping privileges of VM to 0:0 Supported machines are: pc Standard PC (alias of pc-0.12) pc-0.12 Standard PC (default) pc-0.11 Standard PC, qemu 0.11 pc-0.10 Standard PC, qemu 0.10 isapc ISA-only PC ===================== Guido

On Fri, Jun 25, 2010 at 06:08:07PM +0200, Guido Winkelmann wrote:
This is the guest's XML config: ===================== <domain type='kvm'> <name>testserver-a</name> <uuid>ce57ebe0-ea37-0353-bbe0-e23dbbd17708</uuid> <description>libvirt C-API Test</description> <memory>262144</memory> <currentMemory>262144</currentMemory> <vcpu>1</vcpu> <os> <type arch='x86_64' machine='pc-0.13'>hvm</type> <boot dev='hd'/>
[snip]
=====================
(as found in /usr/local/etc/libvirt/qemu/)
and here are the last few lines from the logfile (/usr/local/var/log/libvirt/qemu/testserver-a.log):
Have you set any LIBVIRT_LOG environment variables or set any of the logging settings in /etc/libvirt/libvirtd.conf ? If so, which ones ? Your log file contains alot of stuff I wouldn't normally expect to be there & I wonder if there is a verbose logging level enabled that is causing a buffer overflow somewhre ?
===================== LC_ALL=C PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.13 -enable-kvm -m 256 -smp 1,sockets=1,cores=1,threads=1 -name testserver-a -uuid ce57ebe0-ea37-0353-bbe0-e23dbbd17708 -nodefaults -chardev socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver- a.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no- acpi -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x7 -drive file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 - drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 - drive file=/data/gentoo-install-amd64- minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global isa-fdc.driveA=drive-fdc0-0-0 -device e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net tap,fd=22,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 17:21:14.760: debug : virCgroupNew:542 : New group /libvirt/qemu/testserver-a 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 0:cpu at /mnt/cgroups/cpu in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 1:cpuacct at /mnt/cgroups/cpuacct in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 3:memory at /mnt/cgroups/memory in /sysdefault 17:21:14.760: debug : virCgroupDetect:232 : Detected mount/mapping 4:devices at /mnt/cgroups/devices in /sysdefault 17:21:14.760: debug : virCgroupMakeGroup:484 : Make group /libvirt/qemu/testserver-a 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/cpu/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/cpuacct/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/memory/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupMakeGroup:496 : Make controller /mnt/cgroups/devices/sysdefault/libvirt/qemu/testserver-a/ 17:21:14.760: debug : virCgroupSetValueStr:277 : Set value '/mnt/cgroups/cpu/sysdefault/libvirt/qemu/testserver-a/tasks' to '1761' 17:21:14.770: debug : virCgroupSetValueStr:277 : Set value '/mnt/cgroups/cpuacct/sysdefault/libvirt/qemu/testserver-a/tasks' to '1761' 17:21:14.778: debug : virCgroupSetValueStr:277 : Set value '/mnt/cgroups/memory/sysdefault/libvirt/qemu/testserver-a/tasks' to '1761' 17:21:14.789: debug : virCgroupSetValueStr:277 : Set value '/mnt/cgroups/devices/sysdefault/libvirt/qemu/testserver-a/tasks' to '1761' 17:21:14.802: debug : qemudInitCpuAffinity:2227 : Setting CPU affinity 17:21:14.803: debug : qemuSecurityDACSetProcessLabel:546 : Dropping privileges of VM to 0:0 Supported machines are: pc Standard PC (alias of pc-0.12) pc-0.12 Standard PC (default) pc-0.11 Standard PC, qemu 0.11 pc-0.10 Standard PC, qemu 0.10 isapc ISA-only PC
This is the ultimate problem - your guest XML lists 'pc-0.13', so I assume you must have deployed this guest with a newer QEMU and now trying to start it with an older one. Changing the XML to list one of these valid machine types should let you start the guest Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Am Freitag, 25. Juni 2010 schrieb Daniel P. Berrange:
On Fri, Jun 25, 2010 at 06:08:07PM +0200, Guido Winkelmann wrote:
This is the guest's XML config: ===================== <domain type='kvm'> <name>testserver-a</name> <uuid>ce57ebe0-ea37-0353-bbe0-e23dbbd17708</uuid> <description>libvirt C-API Test</description> <memory>262144</memory> <currentMemory>262144</currentMemory> <vcpu>1</vcpu> <os> <type arch='x86_64' machine='pc-0.13'>hvm</type> <boot dev='hd'/>
[snip]
=====================
(as found in /usr/local/etc/libvirt/qemu/)
and here are the last few lines from the logfile (/usr/local/var/log/libvirt/qemu/testserver-a.log):
Have you set any LIBVIRT_LOG environment variables or set any of the logging settings in /etc/libvirt/libvirtd.conf ? If so, which ones ? Your log file contains alot of stuff I wouldn't normally expect to be there & I wonder if there is a verbose logging level enabled that is causing a buffer overflow somewhre ?
I have log_level = 1 in /usr/local/etc/libvirt/libvirtd.conf. [...]
affinity 17:21:14.803: debug : qemuSecurityDACSetProcessLabel:546 : Dropping privileges of VM to 0:0 Supported machines are: pc Standard PC (alias of pc-0.12) pc-0.12 Standard PC (default) pc-0.11 Standard PC, qemu 0.11 pc-0.10 Standard PC, qemu 0.10 isapc ISA-only PC
This is the ultimate problem - your guest XML lists 'pc-0.13', so I assume you must have deployed this guest with a newer QEMU and now trying to start it with an older one.
Could be... I've been switching from qemu-kvm release 0.12.4 to git and back a couple of times, mostly because libvirt used to silently require support of the "-nodefconfig" option in qemu until a couple of commits ago...
Changing the XML to list one of these valid machine types should let you start the guest
Undefining and redefining the guest domains (without an explicit machine type) made it possible to start them again. IMHO, though, the segfault is a separate issue, and still should not have happened. Guido

I'm still seeing segfaults in virDomainObjListSearchName, expect now they're no longer easily reproducible: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffeffff710 (LWP 5446)] 0x00007ffff7411746 in __strcmp_sse42 () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-1.x86_64 nss- softokn-freebl-3.12.6-2.fc12.1.x86_64 openssl-1.0.0a-1.fc12.x86_64 (gdb) bt #0 0x00007ffff7411746 in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x00007ffff7ac9d79 in virDomainObjListSearchName (payload=0x7fffdc004060, name=<value optimized out>, data=0x7fffd8009850) at conf/domain_conf.c:367 #2 0x00007ffff7ab476e in virHashSearch (table=0x6f59a0, iter=0x7ffff7ac9d50 <virDomainObjListSearchName>, data=0x7fffd8009850) at util/hash.c:582 #3 0x00007ffff7ac9d33 in virDomainFindByName (doms=<value optimized out>, name=0x7fffd8009850 "testserver-d") at conf/domain_conf.c:377 #4 0x00000000004430f6 in qemudDomainLookupByName (conn=0x7fffe00009f0, name=0x7fffd8009850 "testserver-d") at qemu/qemu_driver.c:4166 #5 0x00007ffff7af95cd in virDomainLookupByName (conn=0x7fffe00009f0, name=0x7fffd8009850 "testserver-d") at libvirt.c:2169 #6 0x0000000000423e64 in remoteDispatchDomainLookupByName (server=<value optimized out>, client=<value optimized out>, conn=0x7fffe00009f0, hdr=<value optimized out>, rerr=0x7fffefffec70, args=<value optimized out>, ret=0x7fffefffebc0) at remote.c:2030 #7 0x0000000000426a91 in remoteDispatchClientCall (server=<value optimized out>, client=0x7ffff0095590, msg=0x7ffff00520e0) at dispatch.c:506 #8 0x0000000000426e43 in remoteDispatchClientRequest (server=0x6e3cd0, client=0x7ffff0095590, msg=0x7ffff00520e0) at dispatch.c:388 #9 0x0000000000417ed8 in qemudWorker (data=0x7ffff00008d8) at libvirtd.c:1568 #10 0x00007ffff7878a3a in start_thread () from /lib64/libpthread.so.0 #11 0x00007ffff73d377d in clone () from /lib64/libc.so.6 #12 0x0000000000000000 in ?? () (gdb) This one happened when starting up a previously defined domain. Restarting libvirtd and trying the same thing again resulted in a running domain, with no crashes anywhere... (There were no changes in configuration or installed software between the two attempts.) Guido

Another segfault, again after calling list in virsh after a domain failed to start: ===================== Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffef5fe710 (LWP 30490)] 0x00007ffff7cd5cfd in virDomainObjListCountActive (payload=0x7fffdc006ef0, name=<value optimized out>, data=0x7fffef5fdb0c) at conf/domain_conf.c:6769 6769 if (virDomainObjIsActive(obj)) Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-1.x86_64 nss- softokn-freebl-3.12.6-2.fc12.1.x86_64 openssl-1.0.0a-1.fc12.x86_64 (gdb) (gdb) bt #0 0x00007ffff7cd5cfd in virDomainObjListCountActive (payload=0x7fffdc006ef0, name=<value optimized out>, data=0x7fffef5fdb0c) at conf/domain_conf.c:6769 #1 0x00007ffff7cc06ca in virHashForEach (table=0x6f9820, iter=0x7ffff7cd5ce0 <virDomainObjListCountActive>, data=<value optimized out>) at util/hash.c:495 #2 0x00007ffff7cd5224 in virDomainObjListNumOfDomains (doms=<value optimized out>, active=<value optimized out>) at conf/domain_conf.c:6788 #3 0x0000000000438418 in qemudNumDomains (conn=<value optimized out>) at qemu/qemu_driver.c:4260 #4 0x00007ffff7d05989 in virConnectNumOfDomains (conn=0x7fffe4000e50) at libvirt.c:1903 #5 0x0000000000422d2c in remoteDispatchNumOfDomains (server=<value optimized out>, client=<value optimized out>, conn=0x7fffe4000e50, hdr=<value optimized out>, rerr=0x7fffef5fdc70, args=<value optimized out>, ret=0x7fffef5fdbc0) at remote.c:2905 #6 0x0000000000426bc1 in remoteDispatchClientCall (server=<value optimized out>, client=0x7ffff0053a90, msg=0x7ffff0012240) at dispatch.c:506 #7 0x0000000000426f73 in remoteDispatchClientRequest (server=0x6e3cd0, client=0x7ffff0053a90, msg=0x7ffff0012240) at dispatch.c:388 #8 0x0000000000417ed8 in qemudWorker (data=0x7ffff0000920) at libvirtd.c:1568 #9 0x0000003818c06a3a in start_thread () from /lib64/libpthread.so.0 #10 0x00000038188de77d in clone () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () (gdb) info locals obj = 0x7fffdc006ef0 (gdb) inspect *obj $2 = {lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 825373486, __nusers = 1701060666, __kind = 543651170, __spins = 1769349178, __list = {__prev = 0x7463656e6e6f4372, __next = 0x6d6f44664f6d754e}}, __size = "\000\000\000\000\000\000\000\000.321: debug : virConnectNumOfDom", __align = 0}}, refs = 1936615777, pid = 959983930, state = 540680242, autostart = 1, persistent = 1, def = 0x656666663778303d, newDef = 0xa30356530303034, snapshots = {objs = 0x70797420313d7200}, current_snapshot = 0x7461747320303d65, privateData = 0x72657320303d7375, privateDataFreeFunc = 0x3d6c6169} (gdb) print data $4 = (void *) 0x7fffef5fdb0c (gdb) x 0x7fffef5fdb0c 0x7fffef5fdb0c: 0x00000016 (gdb) print obj->def $6 = (virDomainDefPtr) 0x656666663778303d (gdb) print *(obj->def) Cannot access memory at address 0x656666663778303d (gdb) up #1 0x00007ffff7cc06ca in virHashForEach (table=0x6f9820, iter=0x7ffff7cd5ce0 <virDomainObjListCountActive>, data=<value optimized out>) at util/hash.c:495 495 iter(entry->payload, entry->name, data); (gdb) info locals entry = 0x7fffdc00e860 i = <value optimized out> count = <value optimized out> (gdb) print entry $7 = (virHashEntry *) 0x7fffdc00e860 (gdb) print *entry $8 = {next = 0x0, name = 0x7fffdc00ea90 "654c9839-db0e- ab95-5fad-7c91d9e7c9c4", payload = 0x7fffdc006ef0, valid = 1} (gdb) print {virDomainObj} entry->payload $9 = {lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 825373486, __nusers = 1701060666, __kind = 543651170, __spins = 1769349178, __list = {__prev = 0x7463656e6e6f4372, __next = 0x6d6f44664f6d754e}}, __size = "\000\000\000\000\000\000\000\000.321: debug : virConnectNumOfDom", __align = 0}}, refs = 1936615777, pid = 959983930, state = 540680242, autostart = 1, persistent = 1, def = 0x656666663778303d, newDef = 0xa30356530303034, snapshots = {objs = 0x70797420313d7200}, current_snapshot = 0x7461747320303d65, privateData = 0x72657320303d7375, privateDataFreeFunc = 0x3d6c6169} (gdb) print entry->payload $10 = (void *) 0x7fffdc006ef0 (gdb) print {virDomainDef} 0xa30356530303034 Cannot access memory at address 0xa30356530303034 (gdb) print {virDomainDef} 0x656666663778303d Cannot access memory at address 0x656666663778303d (gdb) up #2 0x00007ffff7cd5224 in virDomainObjListNumOfDomains (doms=<value optimized out>, active=<value optimized out>) at conf/domain_conf.c:6788 6788 virHashForEach(doms->objs, virDomainObjListCountActive, &count); (gdb) info locals count = 22 ===================== (I have tried to piece together some more information about what happened with what little gdb skills I possess... The less interesting bits, where I struggled to get useful information out of gdb, are cut out.) It looks like the real problem is that the def and newDef pointers of the last virDomainObj point to unallocated memory, making libvirtd crash in static inline int virDomainObjIsActive(virDomainObjPtr dom), where it calls return dom->def->id != -1;. Guido

On Mon, Jun 28, 2010 at 06:06:00PM +0200, Guido Winkelmann wrote:
Another segfault, again after calling list in virsh after a domain failed to start:
I haven't reproduced the crashes, but try this patch which I think might solve one possible flaw. diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6ae4e8c..26d935a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1178,9 +1178,10 @@ static void qemuHandleMonitorDestroy(qemuMonitorPtr mon, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; - if (priv->mon == mon) + if (mon && (priv->mon == mon)) { priv->mon = NULL; - virDomainObjUnref(vm); + virDomainObjUnref(vm); + } } static qemuMonitorCallbacks monitorCallbacks = { @@ -1212,6 +1213,8 @@ qemuConnectMonitor(struct qemud_driver *driver, virDomainObjPtr vm) * deleted while the monitor is active */ virDomainObjRef(vm); + priv->mon = NULL; /* Explicitly nullify it so destroy callback sees NULL + * if it is invoked during construction */ priv->mon = qemuMonitorOpen(vm, priv->monConfig, priv->monJSON, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Am Montag, 28. Juni 2010 schrieben Sie:
On Mon, Jun 28, 2010 at 06:06:00PM +0200, Guido Winkelmann wrote:
Another segfault, again after calling list in virsh after a domain failed to start:
I haven't reproduced the crashes, but try this patch which I think might solve one possible flaw.
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6ae4e8c..26d935a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1178,9 +1178,10 @@ static void qemuHandleMonitorDestroy(qemuMonitorPtr mon, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; - if (priv->mon == mon) + if (mon && (priv->mon == mon)) { priv->mon = NULL; - virDomainObjUnref(vm); + virDomainObjUnref(vm); + } }
static qemuMonitorCallbacks monitorCallbacks = { @@ -1212,6 +1213,8 @@ qemuConnectMonitor(struct qemud_driver *driver, virDomainObjPtr vm) * deleted while the monitor is active */ virDomainObjRef(vm);
+ priv->mon = NULL; /* Explicitly nullify it so destroy callback sees NULL + * if it is invoked during construction */ priv->mon = qemuMonitorOpen(vm, priv->monConfig, priv->monJSON,
Looks good so far. There's is still a problem with domains just not starting up (somtimes / most of the time) if the host host is under some load, but at least it doesn't seem crash from a simple list --all in virsh anymore. Guido

On Mon, Jun 28, 2010 at 07:29:43PM +0200, Guido Winkelmann wrote:
Am Montag, 28. Juni 2010 schrieben Sie:
On Mon, Jun 28, 2010 at 06:06:00PM +0200, Guido Winkelmann wrote:
Another segfault, again after calling list in virsh after a domain failed to start:
I haven't reproduced the crashes, but try this patch which I think might solve one possible flaw.
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6ae4e8c..26d935a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1178,9 +1178,10 @@ static void qemuHandleMonitorDestroy(qemuMonitorPtr mon, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; - if (priv->mon == mon) + if (mon && (priv->mon == mon)) { priv->mon = NULL; - virDomainObjUnref(vm); + virDomainObjUnref(vm); + } }
static qemuMonitorCallbacks monitorCallbacks = { @@ -1212,6 +1213,8 @@ qemuConnectMonitor(struct qemud_driver *driver, virDomainObjPtr vm) * deleted while the monitor is active */ virDomainObjRef(vm);
+ priv->mon = NULL; /* Explicitly nullify it so destroy callback sees NULL + * if it is invoked during construction */ priv->mon = qemuMonitorOpen(vm, priv->monConfig, priv->monJSON,
Looks good so far. There's is still a problem with domains just not starting up (somtimes / most of the time) if the host host is under some load, but at least it doesn't seem crash from a simple list --all in virsh anymore.
Actually that patch wasn't very nice, so I've prepared a different one which should fix the problem in a better way. Separately, I'd like to know what errors you get when QEMU fails to start ? Some, but not all, codepaths in the qemuMonitorOpen() method would trigger the destroy callback. The caller does not expect this to be invoked if construction fails, only during normal release of the monitor. This resulted in a possible double-unref of the virDomainObjPtr, because the caller explicitly unrefs the virDomainObjPtr if qemuMonitorOpen() fails * src/qemu/qemu_monitor.c: Don't invoke destroy callback from qemuMonitorOpen() failure paths --- src/qemu/qemu_monitor.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index f428665..ff613a0 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -671,6 +671,12 @@ qemuMonitorOpen(virDomainObjPtr vm, return mon; cleanup: + /* We don't want the 'destroy' callback invoked during + * cleanup from construction failure, because that can + * give a double-unref on virDomainObjPtr in the caller, + * so kill the callbacks now. + */ + mon->cb = NULL; qemuMonitorUnlock(mon); qemuMonitorClose(mon); return NULL; -- 1.6.6.1 Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Am Dienstag, 29. Juni 2010 schrieb Daniel P. Berrange:
Actually that patch wasn't very nice, so I've prepared a different one which should fix the problem in a better way. Separately, I'd like to know what errors you get when QEMU fails to start ?
I've filed a bug report about this: https://bugzilla.redhat.com/show_bug.cgi?id=609575 I'm afraid I'm not getting any really useful error messages... (see the bug report for details.) Guido

Am Donnerstag, 1. Juli 2010 schrieb Guido Winkelmann:
Am Dienstag, 29. Juni 2010 schrieb Daniel P. Berrange:
Actually that patch wasn't very nice, so I've prepared a different one which should fix the problem in a better way. Separately, I'd like to know what errors you get when QEMU fails to start ?
I've filed a bug report about this: https://bugzilla.redhat.com/show_bug.cgi?id=609575
I'm afraid I'm not getting any really useful error messages... (see the bug report for details.)
I have been experimenting with starting qemu manually with the same commandline that libvirt would use, and I have found that if I leave out the "-nodefaults" parameter, the VM will start up reliably again. What are the reasons for libvirt to use that parameter in the first place? What bad things might happen if I just leave it out all the time (or patch my local copy of libvirt that way)? Guido

On Thu, Jul 15, 2010 at 07:00:50PM +0200, Guido Winkelmann wrote:
Am Donnerstag, 1. Juli 2010 schrieb Guido Winkelmann:
Am Dienstag, 29. Juni 2010 schrieb Daniel P. Berrange:
Actually that patch wasn't very nice, so I've prepared a different one which should fix the problem in a better way. Separately, I'd like to know what errors you get when QEMU fails to start ?
I've filed a bug report about this: https://bugzilla.redhat.com/show_bug.cgi?id=609575
I'm afraid I'm not getting any really useful error messages... (see the bug report for details.)
I have been experimenting with starting qemu manually with the same commandline that libvirt would use, and I have found that if I leave out the "-nodefaults" parameter, the VM will start up reliably again.
What are the reasons for libvirt to use that parameter in the first place?
By default QEMU creates a whole bunch of extra devices (serial ports, parallel ports, IDE cdrom & god knows what other junk). -nodefaults removes all this so you get a reliable & predictable set of hardware.
What bad things might happen if I just leave it out all the time (or patch my local copy of libvirt that way)?
You'll get random hardware added to your guest. I don't think -nodefaults is the cause of the problem - it will merely be highlighting a problem elsewhere in QEMU Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (2)
-
Daniel P. Berrange
-
Guido Winkelmann