
Hi, thanks for your answer Daniel. I think the best way to get more information about this bug is to reproduce it with libvirt master branch. However, i'm facing an issue when i try to run my own daemon: there is a chid process which is failing in a loop. I used sudo strace -f ./libvirtd to watch all child processes, and this is the output: https://gist.github.com/Wenzel/620327a9b3b7e13454108c2e472eaa77 One of them is continously failing, and i don't clearly understand why. Does this log file can help you identify the cause maybe ? I'm here to provide more information if needed. Thank you ! 2018-03-27 16:12 GMT+03:00 Daniel P. Berrangé <berrange@redhat.com>:
On Tue, Mar 27, 2018 at 04:04:33PM +0300, Mathieu Tarral wrote:
Are you sure this isa different thread ? It looks identical to the first stack trace you give above.
Yes, the first one is calling libvirtmod.virDomainGetState and the second one libvirtmod.virDomainIsActive.
Interesting. This is an identical stack trace - so we have 2 python threads both calling virDomainIsActive(). Nothing wrong with that per-se - we support multithreaded usage like this.
virDomainGetState() and virDomainIsActive()
Opps, yes i see.
Can you confirm there are no other threads running libvirt code in your python app ? Did you have any thread running the libvirt event loop perhaps ?
Actually i found 2 others threads in Python app calling libvirt.
So, as a recap:
(gdb) bt #0 pthread_sigmask (how=how@entry=0, newmask=<optimized out>, newmask@entry=0x7f4ffd7f8d10, oldmask=oldmask@entry=0x7f4ffd7f8c90) at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:50
This is slightly unusual - pthread_sigmask() should complete in a tiny fraction of a second, so seeing it in the stack trace is odd unless you have very fortuitous timing when taking the stack trace.
#1 0x00007f508e0f52fa in virNetClientIOEventLoop (client=client@entry=0x55a1fde4d2b0, thiscall=thiscall@entry=0x7f4fe005a350) at ../../../src/rpc/virnetclient.c:1659 #2 0x00007f508e0f5a16 in virNetClientIO (thiscall=0x7f4fe005a350, client=0x55a1fde4d2b0) at ../../../src/rpc/virnetclient.c:1944 #3 virNetClientSendInternal (client=client@entry=0x55a1fde4d2b0, msg=msg@entry=0x7f4fe0031f50, expectReply=expectReply@entry=true, nonBlock=nonBlock@entry=false) at ../../../src/rpc/virnetclient.c:2116 #4 0x00007f508e0f7443 in virNetClientSendWithReply (client=client@entry=0x55a1fde4d2b0, msg=msg@entry=0x7f4fe0031f50) at ../../../src/rpc/virnetclient.c:2144 #5 0x00007f508e0f7bf2 in virNetClientProgramCall (prog=prog@entry=0x55a1fdff0f90, client=client@entry=0x55a1fde4d2b0, serial=serial@entry=105, proc=proc@entry=14, noutfds=noutfds@entry=0, outfds=outfds@entry=0x0, ninfds=0x0, infds=0x0, args_filter=0x7f508e0ecba0 <xdr_remote_domain_get_xml_desc_args>, args=0x7f4ffd7f8fe0, ret_filter=0x7f508e0ecbd0 <xdr_remote_domain_get_xml_desc_ret>, ret=0x7f4ffd7f8fd8) at ../../../src/rpc/virnetclientprogram.c:329 #6 0x00007f508e0cdeb4 in callFull (priv=priv@entry=0x55a1fe5ac460, flags=flags@entry=0, fdin=fdin@entry=0x0, fdinlen=fdinlen@entry=0, fdout=fdout@entry=0x0, fdoutlen=fdoutlen@entry=0x0, proc_nr=14, args_filter=0x7f508e0ecba0 <xdr_remote_domain_get_xml_desc_args>, args=0x7f4ffd7f8fe0 "`k.\376\241U", ret_filter=0x7f508e0ecbd0 <xdr_remote_domain_get_xml_desc_ret>, ret=0x7f4ffd7f8fd8 "", conn=<optimized out>) at ../../../src/remote/remote_driver.c:6636 #7 0x00007f508e0d7b58 in call (conn=<optimized out>, ret=0x7f4ffd7f8fd8 "", ret_filter=<optimized out>, args=0x7f4ffd7f8fe0 "`k.\376\241U", args_filter=<optimized out>, proc_nr=14, flags=0, priv=0x55a1fe5ac460) at ../../../src/remote/remote_driver.c:6658 #8 remoteDomainGetXMLDesc (dom=<optimized out>, flags=0) at ../../../src/remote/remote_client_bodies.h:2698 #9 0x00007f508e08f5c1 in virDomainGetXMLDesc (domain=domain@entry=0x55a1fe212da0, flags=0) at ../../../src/libvirt-domain.c:2592 #10 0x00007f508e46c8c0 in libvirt_virDomainGetXMLDesc (self=<optimized out>, args=<optimized out>) at build/libvirt.c:1212 #11 0x000055a1fb4cb6df in PyCFunction_Call () at ../Objects/methodobject.c:109
Aside from the thing mentioned above I don't see any reason why you would have bad problems here.
I don't have much more useful to suggest, other than to try using the very latest libvirt to see if you get the same behaviour. If not, then it would point to a bug in old libvirt, but I don't recall anything that would cause this behaviour you see offhand.
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
-- Mathieu Tarral