
On 12/30/2011 06:18 PM, Chip Vincent wrote:
The code looks good but I'm seeing some strange failures on my system with this patch applied.
System details: Red Hat Enterprise Linux Workstation release 6.2 (Santiago) Linux oc4551028142.ibm.com 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
tog-pegasus-2.11.0-2.el6.x86_64 libvirt-0.9.4-23.el6_2.1.x86_64
I need to dig a bit deeper, but here's what I've got so far.
With CU_DEBUG enabled, here's the last few lines of the libvirt-cim.log: std_indication.c(124): In raise std_indication.c(140): Indication is KVM_ResourceAllocationSettingDataCreatedIndication std_indication.c(67): Indications disabled for this provider std_invokemethod.c(301): Method `DefineSystem' returned 0
Back trace from core file: 0 0x00007f28c7251c6a in Pegasus::SCMOInstance::getCIMInstance(Pegasus::CIMInstance&) const () from /usr/lib64/libpegcommon.so.1 #1 0x00007f28c3bb753c in ?? () from /usr/lib64/Pegasus/providerManagers/libCMPIProviderManager.so #2 0x00007f28c3ba3e2d in ?? () from /usr/lib64/Pegasus/providerManagers/libCMPIProviderManager.so #3 0x00007f28c2f75c93 in _std_invokemethod () from /usr/lib64/libcmpiutil.so.0 #4 0x00007f28c3b80df2 in Pegasus::CMPIProviderManager::handleInvokeMethodRequest(Pegasus::Message const*) () from /usr/lib64/Pegasus/providerManagers/libCMPIProviderManager.so #5 0x00007f28c3b923db in Pegasus::CMPIProviderManager::processMessage(Pegasus::Message*) () from /usr/lib64/Pegasus/providerManagers/libCMPIProviderManager.so #6 0x00007f28c8b4a1c6 in Pegasus::BasicProviderManagerRouter::processMessage(Pegasus::Message*) () from /usr/lib64/libpegpmrouter.so.1 #7 0x00007f28c8f7e738 in ?? () #8 0x00007f28c8f7f925 in ?? () #9 0x00007f28c72f21c5 in Pegasus::ThreadPool::_loop(void*) () from /usr/lib64/libpegcommon.so.1 #10 0x00007f28c6da07f1 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f28c610670d in clone () from /lib64/libc.so.6
Just to be safe, I removed this patch and re-ran cimtest and *did not* get the crashes. I do see some failures, but they seem relatively minor. I then re-applied the patch and could reproduce the failures.
Ouch, my bad. I should have make sure the other test cases were not impacted by this patch.
The failures appear in different cimtests on most runs. Here's one example: "ComputerSystem - 05_activate_defined_start.py: FAIL ERROR - Got CIM error CIM_ERR_FAILED: Lost connection with cimprovagt "libvirt-cim". with return code 1". Once you get the first failure, it appears the follow-on failures are side effects of some form of corruption in libvirt-cim or libvirt. For example: "ERROR - Got CIM error CIM_ERR_FAILED: The byte sequence starting at index 0 is not valid UTF-8 encoding: 0x89 0x04 0x24 0x48 0x83 0xC4 0x28 0xC3 0x66 0x0F 0x1F with return code 1"
Libvirt is often left in an odd state once I get a failure. I've seen test VMs left over that cannot be deleted with Virt Mgr, and once I was unable to connect to libvirt at all, which required a service restart.
I'll post more once I have a chance to experiment more.
While I was developing this feature I was having serious problems, aleatory crashes most probably due to race conditions. This seems to be the problem in this case. -- Eduardo de Barros Lima Software Engineer, Open Virtualization Linux Technology Center - IBM/Brazil eblima@br.ibm.com