On 12/30/2011 06:18 PM, Chip Vincent wrote:
The code looks good but I'm seeing some strange failures on my
system
with this patch applied.
System details:
Red Hat Enterprise Linux Workstation release 6.2 (Santiago)
Linux
oc4551028142.ibm.com 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9
08:03:13 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
tog-pegasus-2.11.0-2.el6.x86_64
libvirt-0.9.4-23.el6_2.1.x86_64
I need to dig a bit deeper, but here's what I've got so far.
With CU_DEBUG enabled, here's the last few lines of the libvirt-cim.log:
std_indication.c(124): In raise
std_indication.c(140): Indication is
KVM_ResourceAllocationSettingDataCreatedIndication
std_indication.c(67): Indications disabled for this provider
std_invokemethod.c(301): Method `DefineSystem' returned 0
Back trace from core file:
0 0x00007f28c7251c6a in
Pegasus::SCMOInstance::getCIMInstance(Pegasus::CIMInstance&) const ()
from /usr/lib64/libpegcommon.so.1
#1 0x00007f28c3bb753c in ?? ()
from /usr/lib64/Pegasus/providerManagers/libCMPIProviderManager.so
#2 0x00007f28c3ba3e2d in ?? ()
from /usr/lib64/Pegasus/providerManagers/libCMPIProviderManager.so
#3 0x00007f28c2f75c93 in _std_invokemethod () from
/usr/lib64/libcmpiutil.so.0
#4 0x00007f28c3b80df2 in
Pegasus::CMPIProviderManager::handleInvokeMethodRequest(Pegasus::Message
const*) ()
from /usr/lib64/Pegasus/providerManagers/libCMPIProviderManager.so
#5 0x00007f28c3b923db in
Pegasus::CMPIProviderManager::processMessage(Pegasus::Message*) () from
/usr/lib64/Pegasus/providerManagers/libCMPIProviderManager.so
#6 0x00007f28c8b4a1c6 in
Pegasus::BasicProviderManagerRouter::processMessage(Pegasus::Message*)
() from /usr/lib64/libpegpmrouter.so.1
#7 0x00007f28c8f7e738 in ?? ()
#8 0x00007f28c8f7f925 in ?? ()
#9 0x00007f28c72f21c5 in Pegasus::ThreadPool::_loop(void*) ()
from /usr/lib64/libpegcommon.so.1
#10 0x00007f28c6da07f1 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f28c610670d in clone () from /lib64/libc.so.6
Just to be safe, I removed this patch and re-ran cimtest and *did not*
get the crashes. I do see some failures, but they seem relatively minor.
I then re-applied the patch and could reproduce the failures.
Ouch, my bad. I should have make sure the other test cases were not
impacted by this patch.
The failures appear in different cimtests on most runs. Here's
one
example: "ComputerSystem - 05_activate_defined_start.py: FAIL
ERROR - Got CIM error CIM_ERR_FAILED: Lost connection with
cimprovagt "libvirt-cim". with return code 1". Once you get the first
failure, it appears the follow-on failures are side effects of some form
of corruption in libvirt-cim or libvirt. For example: "ERROR - Got
CIM error CIM_ERR_FAILED: The byte sequence starting at index 0 is not
valid UTF-8 encoding: 0x89 0x04 0x24 0x48 0x83 0xC4 0x28 0xC3 0x66 0x0F
0x1F with return code 1"
Libvirt is often left in an odd state once I get a failure. I've seen
test VMs left over that cannot be deleted with Virt Mgr, and once I was
unable to connect to libvirt at all, which required a service restart.
I'll post more once I have a chance to experiment more.
While I was developing this feature I was having serious problems,
aleatory crashes most probably due to race conditions. This seems to be
the problem in this case.
--
Eduardo de Barros Lima
Software Engineer, Open Virtualization
Linux Technology Center - IBM/Brazil
eblima(a)br.ibm.com