[libvirt-users] QemuDomainObjEndJob called when libvirtd is started and libvirt insists qemu is using the wrong disk source.

List, I was under the impression that I could restart libvirtd without it destroying my VMs, but am not finding that to be true. When I killall libvirtd then my VM's keep running, but then when I start libvirtd it calls qemuDomainObjEndJob:1542 : Stopping job: modify (async=none vm=0x7fb8cc0d8510 name=test) and my domain gets whacked. Any way to disable this behavior? Also, while I'm at it, due to issues with snapshotting, I ended up with two domains where libvirt insists the disk source is incorrect which causes many things to break like this: root@wasvirt1:/etc/libvirt# virsh domblklist wasdev Target Source ------------------------------------------------ vda /glustervol1/vm/wasdev/wasdev.reboot hdc /dev/sr0 Yet: root@wasvirt1:/etc/libvirt# lsof | grep wasdev 2235 /usr/bin/qemu-system-x86_64 /var/log/libvirt/qemu/wasdev.log 2235 /usr/bin/qemu-system-x86_64 /var/log/libvirt/qemu/wasdev.log 2235 /usr/bin/qemu-system-x86_64 /glustervol1/vm/wasdev/wasdev.qcow2 The reason is because a blockcommit --active --pivot doesn't work, I blockjob --abort, then try again, it works the second time, but then the disk is wrong. Help would be greatly appreciated. Thanks, schu

On 04/20/2015 08:45 PM, Matthew Schumacher wrote:
List,
I was under the impression that I could restart libvirtd without it destroying my VMs, but am not finding that to be true.
If not, then something is wrong.
When I killall libvirtd then my VM's keep running, but then when I start libvirtd it calls qemuDomainObjEndJob:1542 : Stopping job: modify (async=none vm=0x7fb8cc0d8510 name=test)
That message doesn't indicate that a domain is being destroyed, just that libvirt performed some operation on the domain that modified it in some way.
and my domain gets whacked.
If your domain is terminated, there should be a libvirtd error log (at least one saying that it got end of file from the monitor, meaning that the qemu process itself terminated unexpectedly. In that case there should be some indication of qemu's problem at the end of /var/log/libvirt/qemu/test.log).
Any way to disable this behavior?
Whatever the bug is causing this to happen, fix it :-)
Also, while I'm at it, due to issues with snapshotting, I ended up with two domains where libvirt insists the disk source is incorrect which causes many things to break like this:
root@wasvirt1:/etc/libvirt# virsh domblklist wasdev Target Source ------------------------------------------------ vda /glustervol1/vm/wasdev/wasdev.reboot hdc /dev/sr0
Yet:
root@wasvirt1:/etc/libvirt# lsof | grep wasdev 2235 /usr/bin/qemu-system-x86_64 /var/log/libvirt/qemu/wasdev.log 2235 /usr/bin/qemu-system-x86_64 /var/log/libvirt/qemu/wasdev.log 2235 /usr/bin/qemu-system-x86_64 /glustervol1/vm/wasdev/wasdev.qcow2
The reason is because a blockcommit --active --pivot doesn't work, I blockjob --abort, then try again, it works the second time, but then the disk is wrong.
Help would be greatly appreciated.
Thanks, schu
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users

On 04/20/2015 05:12 PM, Laine Stump wrote:
On 04/20/2015 08:45 PM, Matthew Schumacher wrote:
List,
I was under the impression that I could restart libvirtd without it destroying my VMs, but am not finding that to be true. If not, then something is wrong.
Here are the logs from the vm: 2015-04-20 23:50:20.387+0000: shutting down qemu: terminating on signal 15 from pid 25298 And part of the logs from libvirt when I started it. Does this help at all? 2015-04-20 23:50:20.387+0000: 25373: error : virStorageFileBackendFileReadHeader:1443 : Failed to open file '/dev/sr0': No medium found 2015-04-20 23:50:20.387+0000: 25373: debug : virStorageFileReadHeader:2714 : read of storage header 0x7fb8cc0d75f0: ret=-1 2015-04-20 23:50:20.387+0000: 25373: debug : virStorageFileBackendFileDeinit:1369 : deinitializing FS storage file 0x7fb8cc0d75f0 (block:/dev/sr0) 2015-04-20 23:50:20.387+0000: 25373: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8cc0155b0 2015-04-20 23:50:20.387+0000: 25373: debug : qemuDomainObjEndJob:1542 : Stopping job: modify (async=none vm=0x7fb8cc0d8510 name=test) 2015-04-20 23:50:20.387+0000: 25373: info : virObjectRef:296 : OBJECT_REF: obj=0x7fb8cc0155b0 2015-04-20 23:50:20.387+0000: 25373: debug : qemuProcessStop:5089 : Shutting down vm=0x7fb8cc0d8510 name=test id=6 pid=25245 flags=0 2015-04-20 23:50:20.387+0000: 25373: info : virObjectRef:296 : OBJECT_REF: obj=0x7fb8cc0155b0 2015-04-20 23:50:20.387+0000: 25373: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8cc0155b0 2015-04-20 23:50:20.387+0000: 25373: debug : virFileClose:102 : Closed fd 19 2015-04-20 23:50:20.387+0000: 25373: info : qemuMonitorClose:904 : QEMU_MONITOR_CLOSE: mon=0x7fb8d0000bd0 refs=2 2015-04-20 23:50:20.387+0000: 25373: info : virEventPollRemoveHandle:186 : EVENT_POLL_REMOVE_HANDLE: watch=10 2015-04-20 23:50:20.387+0000: 25373: debug : virEventPollRemoveHandle:199 : mark delete 9 18 2015-04-20 23:50:20.387+0000: 25373: debug : virEventPollInterruptLocked:726 : Interrupting 2015-04-20 23:50:20.387+0000: 25373: debug : virFileClose:102 : Closed fd 18 2015-04-20 23:50:20.387+0000: 25373: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8d0000bd0 2015-04-20 23:50:20.387+0000: 25373: debug : qemuProcessKill:5047 : vm=0x7fb8cc0d8510 name=test pid=25245 flags=5 2015-04-20 23:50:20.387+0000: 25373: debug : virProcessKillPainfully:352 : vpid=25245 force=1 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollRunOnce:650 : Poll got 2 event(s) 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchTimeouts:432 : Dispatch 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:478 : Dispatch 8 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=0 w=1 2015-04-20 23:50:20.387+0000: 25298: info : virEventPollDispatchHandles:506 : EVENT_POLL_DISPATCH_HANDLE: watch=1 events=1 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=1 w=2 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=4 w=5 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=5 w=6 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=7 w=8 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=8 w=9 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=9 w=10 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:495 : Skip deleted n=9 w=10 f=18 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=10 w=11 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupTimeouts:525 : Cleanup 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupTimeouts:561 : Found 0 out of 0 timeout slots used, releasing 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupHandles:574 : Cleanup 11 2015-04-20 23:50:20.387+0000: 25298: info : virEventPollCleanupHandles:587 : EVENT_POLL_PURGE_HANDLE: watch=10 2015-04-20 23:50:20.387+0000: 25298: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8d0000bd0 2015-04-20 23:50:20.387+0000: 25298: info : virObjectUnref:261 : OBJECT_DISPOSE: obj=0x7fb8d0000bd0 2015-04-20 23:50:20.387+0000: 25298: debug : qemuMonitorDispose:259 : mon=0x7fb8d0000bd0 2015-04-20 23:50:20.387+0000: 25298: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8cc0d8510 2015-04-20 23:50:20.387+0000: 25298: debug : virEventRunDefaultImpl:305 : running default event implementation 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupTimeouts:525 : Cleanup 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupTimeouts:561 : Found 0 out of 0 timeout slots used, releasing 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupHandles:574 : Cleanup 10 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=0 w=1, f=7 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=1 w=2, f=9 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=2 w=3, f=11 e=0 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=3 w=4, f=12 e=0 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=4 w=5, f=6 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=5 w=6, f=13 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=6 w=7, f=14 e=0 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=7 w=8, f=14 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=8 w=9, f=16 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=9 w=11, f=20 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCalculateTimeout:338 : Calculate expiry of 0 timers 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCalculateTimeout:370 : No timeout is pending 2015-04-20 23:50:20.387+0000: 25298: info : virEventPollRunOnce:640 : EVENT_POLL_RUN: nhandles=7 timeout=-1

On 04/20/2015 09:28 PM, Matthew Schumacher wrote:
On 04/20/2015 05:12 PM, Laine Stump wrote:
On 04/20/2015 08:45 PM, Matthew Schumacher wrote:
List,
I was under the impression that I could restart libvirtd without it destroying my VMs, but am not finding that to be true. If not, then something is wrong.
Here are the logs from the vm:
2015-04-20 23:50:20.387+0000: shutting down qemu: terminating on signal 15 from pid 25298
And part of the logs from libvirt when I started it. Does this help at all?
2015-04-20 23:50:20.387+0000: 25373: error : virStorageFileBackendFileReadHeader:1443 : Failed to open file '/dev/sr0': No medium found 2015-04-20 23:50:20.387+0000: 25373: debug : virStorageFileReadHeader:2714 : read of storage header 0x7fb8cc0d75f0: ret=-1 2015-04-20 23:50:20.387+0000: 25373: debug : virStorageFileBackendFileDeinit:1369 : deinitializing FS storage file 0x7fb8cc0d75f0 (block:/dev/sr0) 2015-04-20 23:50:20.387+0000: 25373: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8cc0155b0 2015-04-20 23:50:20.387+0000: 25373: debug : qemuDomainObjEndJob:1542 : Stopping job: modify (async=none vm=0x7fb8cc0d8510 name=test) 2015-04-20 23:50:20.387+0000: 25373: info : virObjectRef:296 : OBJECT_REF: obj=0x7fb8cc0155b0 2015-04-20 23:50:20.387+0000: 25373: debug : qemuProcessStop:5089 : Shutting down vm=0x7fb8cc0d8510 name=test id=6 pid=25245 flags=0 2015-04-20 23:50:20.387+0000: 25373: info : virObjectRef:296 : OBJECT_REF: obj=0x7fb8cc0155b0 2015-04-20 23:50:20.387+0000: 25373: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8cc0155b0 2015-04-20 23:50:20.387+0000: 25373: debug : virFileClose:102 : Closed fd 19 2015-04-20 23:50:20.387+0000: 25373: info : qemuMonitorClose:904 : QEMU_MONITOR_CLOSE: mon=0x7fb8d0000bd0 refs=2 2015-04-20 23:50:20.387+0000: 25373: info : virEventPollRemoveHandle:186 : EVENT_POLL_REMOVE_HANDLE: watch=10 2015-04-20 23:50:20.387+0000: 25373: debug : virEventPollRemoveHandle:199 : mark delete 9 18 2015-04-20 23:50:20.387+0000: 25373: debug : virEventPollInterruptLocked:726 : Interrupting 2015-04-20 23:50:20.387+0000: 25373: debug : virFileClose:102 : Closed fd 18 2015-04-20 23:50:20.387+0000: 25373: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8d0000bd0 2015-04-20 23:50:20.387+0000: 25373: debug : qemuProcessKill:5047 : vm=0x7fb8cc0d8510 name=test pid=25245 flags=5
Well, this is when the qemu process is killed (I'm pretty clever, eh? :-) I have 3 questions: 1) what version of libvirt are you running and on what distro? 2) can you reproduce this reliably? 3) If the answer to 2 is "yes", do you have the libvirt-debuginfo package installed, and can you stop libvirtd, then run it under gdb with a breakpoint at qemuProcessKill? Once you hit that breakpoint, you could get a backtrace with the "bt" command and that might give us some useful info about where it is coming from. Oh, by the way, are these perhaps transient domains started with virDomainCreate*? Or are they standard persistent domains defined with virDomainDefineXML? (Also, I'll just mention that qemuProcessKill() is called from qemuProcessStop(), and I notice that is called in a couple places related to snapshot create and revert which you mentioned in the original message. Since this isn't normal behavior, and the snapshot stuff is under active development, I wonder if this is related...)
2015-04-20 23:50:20.387+0000: 25373: debug : virProcessKillPainfully:352 : vpid=25245 force=1 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollRunOnce:650 : Poll got 2 event(s) 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchTimeouts:432 : Dispatch 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:478 : Dispatch 8 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=0 w=1 2015-04-20 23:50:20.387+0000: 25298: info : virEventPollDispatchHandles:506 : EVENT_POLL_DISPATCH_HANDLE: watch=1 events=1 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=1 w=2 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=4 w=5 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=5 w=6 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=7 w=8 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=8 w=9 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=9 w=10 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:495 : Skip deleted n=9 w=10 f=18 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollDispatchHandles:492 : i=10 w=11 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupTimeouts:525 : Cleanup 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupTimeouts:561 : Found 0 out of 0 timeout slots used, releasing 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupHandles:574 : Cleanup 11 2015-04-20 23:50:20.387+0000: 25298: info : virEventPollCleanupHandles:587 : EVENT_POLL_PURGE_HANDLE: watch=10
2015-04-20 23:50:20.387+0000: 25298: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8d0000bd0 2015-04-20 23:50:20.387+0000: 25298: info : virObjectUnref:261 : OBJECT_DISPOSE: obj=0x7fb8d0000bd0 2015-04-20 23:50:20.387+0000: 25298: debug : qemuMonitorDispose:259 : mon=0x7fb8d0000bd0
2015-04-20 23:50:20.387+0000: 25298: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x7fb8cc0d8510
2015-04-20 23:50:20.387+0000: 25298: debug : virEventRunDefaultImpl:305 : running default event implementation 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupTimeouts:525 : Cleanup 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupTimeouts:561 : Found 0 out of 0 timeout slots used, releasing 0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCleanupHandles:574 : Cleanup 10 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=0 w=1, f=7 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=1 w=2, f=9 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=2 w=3, f=11 e=0 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=3 w=4, f=12 e=0 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=4 w=5, f=6 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=5 w=6, f=13 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=6 w=7, f=14 e=0 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=7 w=8, f=14 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=8 w=9, f=16 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollMakePollFDs:400 : Prepare n=9 w=11, f=20 e=1 d=0 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCalculateTimeout:338 : Calculate expiry of 0 timers 2015-04-20 23:50:20.387+0000: 25298: debug : virEventPollCalculateTimeout:370 : No timeout is pending 2015-04-20 23:50:20.387+0000: 25298: info : virEventPollRunOnce:640 : EVENT_POLL_RUN: nhandles=7 timeout=-1
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users

On 04/20/2015 05:42 PM, Laine Stump wrote:
Well, this is when the qemu process is killed (I'm pretty clever, eh? :-)
I have 3 questions:
1) what version of libvirt are you running and on what distro?
2) can you reproduce this reliably?
3) If the answer to 2 is "yes", do you have the libvirt-debuginfo package installed, and can you stop libvirtd, then run it under gdb with a breakpoint at qemuProcessKill? Once you hit that breakpoint, you could get a backtrace with the "bt" command and that might give us some useful info about where it is coming from.
Oh, by the way, are these perhaps transient domains started with virDomainCreate*? Or are they standard persistent domains defined with virDomainDefineXML?
(Also, I'll just mention that qemuProcessKill() is called from qemuProcessStop(), and I notice that is called in a couple places related to snapshot create and revert which you mentioned in the original message. Since this isn't normal behavior, and the snapshot stuff is under active development, I wonder if this is related...)
Laine, First, thanks for your help, I appreciate it. Your questions: 1. # libvirtd -V libvirtd (libvirt) 1.2.14 2. I can reproduce this every time. 3. I compile it myself, so I'll work on getting a libvirt-debug build going. So this is compiled on Slackware64-14.1 as I prefer a more bsd linux and like the simplicity of a traditional init, as well as the ability to cook this down into a very simple usb based distro not unlike slax. Earlier I posted a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1210903) and got a reply fro Shanzhi that said that he was unable to reproduce using libvirt-1.2.14-1.el7.x86_64 and qemu-kvm-rhev-2.2.0-8.el7.x86_64, but I don't see his comment on the ticket anymore, so I wonder if he was able to reproduce and removed his comment or perhaps his comment got removed. If it does indeed work on the Redhat packages then that seems to indicate that there is something that doesn't work when libvirt is built on non-redhat systems which may be related. Let me do some more digging and I'll back back to you tomorrow. Thanks! schu

----- Original Message -----
From: "Matthew Schumacher" <matt.s@aptalaska.com> To: "Laine Stump" <laine@laine.org>, libvirt-users@redhat.com Sent: Wednesday, April 22, 2015 5:53:34 AM Subject: Re: [libvirt-users] QemuDomainObjEndJob called when libvirtd is started and libvirt insists qemu is using the wrong disk source.
On 04/20/2015 05:42 PM, Laine Stump wrote:
Well, this is when the qemu process is killed (I'm pretty clever, eh? :-)
I have 3 questions:
1) what version of libvirt are you running and on what distro?
2) can you reproduce this reliably?
3) If the answer to 2 is "yes", do you have the libvirt-debuginfo package installed, and can you stop libvirtd, then run it under gdb with a breakpoint at qemuProcessKill? Once you hit that breakpoint, you could get a backtrace with the "bt" command and that might give us some useful info about where it is coming from.
Oh, by the way, are these perhaps transient domains started with virDomainCreate*? Or are they standard persistent domains defined with virDomainDefineXML?
(Also, I'll just mention that qemuProcessKill() is called from qemuProcessStop(), and I notice that is called in a couple places related to snapshot create and revert which you mentioned in the original message. Since this isn't normal behavior, and the snapshot stuff is under active development, I wonder if this is related...)
Laine,
First, thanks for your help, I appreciate it. Your questions:
1. # libvirtd -V libvirtd (libvirt) 1.2.14
2. I can reproduce this every time.
3. I compile it myself, so I'll work on getting a libvirt-debug build going.
So this is compiled on Slackware64-14.1 as I prefer a more bsd linux and like the simplicity of a traditional init, as well as the ability to cook this down into a very simple usb based distro not unlike slax.
Earlier I posted a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1210903) and got a reply fro Shanzhi that said that he was unable to reproduce using libvirt-1.2.14-1.el7.x86_64 and qemu-kvm-rhev-2.2.0-8.el7.x86_64, but I don't see his comment on the ticket anymore, so I wonder if he was able to reproduce and removed his comment or perhaps his comment got removed. If it does indeed work on the Redhat packages then that seems to indicate that there is something that doesn't work when libvirt is built on non-redhat systems which may be related. I can reply this question, :) I didn't delete the comment, and you're unable delete your comment in bugzilla. I just set it to Private as I mention some downstream info, so,.. I do can't reproduce it even with libvirt build from latest libvirt.git on my Fedora21. For your second problem in that bug, you can refer to bug 1197592
Let me do some more digging and I'll back back to you tomorrow.
Thanks! schu
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users
-- Regards shyu

On 04/21/2015 05:53 PM, Matthew Schumacher wrote:
On 04/20/2015 05:42 PM, Laine Stump wrote:
Well, this is when the qemu process is killed (I'm pretty clever, eh? :-)
I have 3 questions:
1) what version of libvirt are you running and on what distro?
2) can you reproduce this reliably?
3) If the answer to 2 is "yes", do you have the libvirt-debuginfo package installed, and can you stop libvirtd, then run it under gdb with a breakpoint at qemuProcessKill? Once you hit that breakpoint, you could get a backtrace with the "bt" command and that might give us some useful info about where it is coming from.
Oh, by the way, are these perhaps transient domains started with virDomainCreate*? Or are they standard persistent domains defined with virDomainDefineXML?
(Also, I'll just mention that qemuProcessKill() is called from qemuProcessStop(), and I notice that is called in a couple places related to snapshot create and revert which you mentioned in the original message. Since this isn't normal behavior, and the snapshot stuff is under active development, I wonder if this is related...) Laine,
First, thanks for your help, I appreciate it. Your questions:
1. # libvirtd -V libvirtd (libvirt) 1.2.14
2. I can reproduce this every time.
3. I compile it myself, so I'll work on getting a libvirt-debug build going.
If you're compiling yourself, then you should be all set to run under gdb. libvirt-debuginfo is just a separate subpackage that contains all the symbol and line number info from the build so that backtraces in gdb make sense. Try attaching gdb to the libvirtd process and do something like "thread apply all bt" - if you see function/argument/file names and line numbers, then you already have the symbols available.
So this is compiled on Slackware64-14.1 as I prefer a more bsd linux and like the simplicity of a traditional init, as well as the ability to cook this down into a very simple usb based distro not unlike slax.
Occasionally there are cases where libvirtd terminates guests on restart due to some failure to parse the guest's status XML, inability to connect to the guest's qemu monitor, or some other thing that would render the guest inaccessible. We of course want to eliminate as many of those situations as possible, so we would love to see the backtrace you can get from gdb.

On 04/22/2015 08:19 AM, Laine Stump wrote:
So this is compiled on Slackware64-14.1 as I prefer a more bsd linux and like the simplicity of a traditional init, as well as the ability to cook this down into a very simple usb based distro not unlike slax. Occasionally there are cases where libvirtd terminates guests on restart due to some failure to parse the guest's status XML, inability to connect to the guest's qemu monitor, or some other thing that would render the guest inaccessible. We of course want to eliminate as many of
If you're compiling yourself, then you should be all set to run under gdb. libvirt-debuginfo is just a separate subpackage that contains all the symbol and line number info from the build so that backtraces in gdb make sense. Try attaching gdb to the libvirtd process and do something like "thread apply all bt" - if you see function/argument/file names and line numbers, then you already have the symbols available. those situations as possible, so we would love to see the backtrace you can get from gdb.
Laine, My apologies on the delay getting back to you, I've been pretty slammed lately. Anyway I have the bracktrace you requested. I simply started libvirtd, started my vm, killed libvirtd, then started it again with gdb in foreground mode with the breakpoint set. I got a ton of debug info that I posted here: http://www.aptalaska.net/~matt.s/libvirt.txt but the backtrace is simply: Breakpoint 1, 0x00007fffc72a0240 in qemuProcessKill () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so (gdb) bt #0 0x00007fffc72a0240 in qemuProcessKill () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #1 0x00007fffc72a19cd in qemuProcessStop () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #2 0x00007fffc72a27d6 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #3 0x00007ffff74b6882 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ffff5722ce2 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ffff31d38cd in clone () from /lib64/libc.so.6 (gdb) Thanks! schu

On 04/28/2015 06:03 PM, Matthew Schumacher wrote:
On 04/22/2015 08:19 AM, Laine Stump wrote:
So this is compiled on Slackware64-14.1 as I prefer a more bsd linux and like the simplicity of a traditional init, as well as the ability to cook this down into a very simple usb based distro not unlike slax. Occasionally there are cases where libvirtd terminates guests on restart due to some failure to parse the guest's status XML, inability to connect to the guest's qemu monitor, or some other thing that would render the guest inaccessible. We of course want to eliminate as many of
If you're compiling yourself, then you should be all set to run under gdb. libvirt-debuginfo is just a separate subpackage that contains all the symbol and line number info from the build so that backtraces in gdb make sense. Try attaching gdb to the libvirtd process and do something like "thread apply all bt" - if you see function/argument/file names and line numbers, then you already have the symbols available. those situations as possible, so we would love to see the backtrace you can get from gdb.
Laine,
My apologies on the delay getting back to you, I've been pretty slammed lately. Anyway I have the bracktrace you requested.
I simply started libvirtd, started my vm, killed libvirtd, then started it again with gdb in foreground mode with the breakpoint set.
I got a ton of debug info that I posted here:
http://www.aptalaska.net/~matt.s/libvirt.txt
but the backtrace is simply:
Breakpoint 1, 0x00007fffc72a0240 in qemuProcessKill () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so (gdb) bt #0 0x00007fffc72a0240 in qemuProcessKill () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #1 0x00007fffc72a19cd in qemuProcessStop () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #2 0x00007fffc72a27d6 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #3 0x00007ffff74b6882 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ffff5722ce2 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ffff31d38cd in clone () from /lib64/libc.so.6 (gdb)
Bah. Your debug symbols aren't enough to give a full stack trace - what I was really looking for was those things marked as ??. However, your debug output shows several things leading up tothe qemuProcessStop that might have been the reason for it being called: 1) virStorageFileBackendFileReadHeader() complains that there is no medium in /dev/sr0 (i.e. no disk in the CD drive) (this was actually in your much more abbreviated log in an earlier message, but for some reason I skipped over it :-/) 2) libvirt sends the following command to qemu: {"execute":"block-commit","arguments":{"device":"bogus"},"id":"libvirt-4"} and gets back this error: {"id": "libvirt-4", "error": {"class": "DeviceNotFound", "desc": "Device 'bogus' not found"}} Since (1) actually generates a libvirt error log, that is what I would look into first.

On 04/28/2015 05:13 PM, Laine Stump wrote:
Bah. Your debug symbols aren't enough to give a full stack trace - what I was really looking for was those things marked as ??.
However, your debug output shows several things leading up tothe qemuProcessStop that might have been the reason for it being called:
1) virStorageFileBackendFileReadHeader() complains that there is no medium in /dev/sr0 (i.e. no disk in the CD drive) (this was actually in your much more abbreviated log in an earlier message, but for some reason I skipped over it :-/)
2) libvirt sends the following command to qemu:
{"execute":"block-commit","arguments":{"device":"bogus"},"id":"libvirt-4"}
and gets back this error:
{"id": "libvirt-4", "error": {"class": "DeviceNotFound", "desc": "Device 'bogus' not found"}}
Since (1) actually generates a libvirt error log, that is what I would look into first.
Thanks Laine, Confirmed, if I: virsh attach-disk test '' hdc --type cdrom Then I can kick libvirtd all I want and it doesn't take out my domain. So now that that is fixed, I'm back to trying to figure out out why blockcopy --wait won't actually wait, as it's hard to migrate domains from one storage to another while needing to continually call blockjob to see if we are 100% mirroring. I posted plenty of debug here: https://bugzilla.redhat.com/attachment.cgi?id=1019911 I wonder if the issue isn't obvious to you since you know a bit more about how libvirt interacts with qemu than I do. Also, is there a link so some documentation that describes all of the json going between libvirt and qemu? Thanks for your help! schu

On 04/28/2015 09:21 PM, Matthew Schumacher wrote:
On 04/28/2015 05:13 PM, Laine Stump wrote:
Bah. Your debug symbols aren't enough to give a full stack trace - what I was really looking for was those things marked as ??.
However, your debug output shows several things leading up tothe qemuProcessStop that might have been the reason for it being called:
1) virStorageFileBackendFileReadHeader() complains that there is no medium in /dev/sr0 (i.e. no disk in the CD drive) (this was actually in your much more abbreviated log in an earlier message, but for some reason I skipped over it :-/)
2) libvirt sends the following command to qemu:
{"execute":"block-commit","arguments":{"device":"bogus"},"id":"libvirt-4"}
and gets back this error:
{"id": "libvirt-4", "error": {"class": "DeviceNotFound", "desc": "Device 'bogus' not found"}}
Since (1) actually generates a libvirt error log, that is what I would look into first.
Thanks Laine,
Confirmed, if I:
virsh attach-disk test '' hdc --type cdrom
Then I can kick libvirtd all I want and it doesn't take out my domain.
So now that that is fixed, I'm back to trying to figure out out why blockcopy --wait won't actually wait, as it's hard to migrate domains from one storage to another while needing to continually call blockjob to see if we are 100% mirroring.
I posted plenty of debug here: https://bugzilla.redhat.com/attachment.cgi?id=1019911
That is an attachment to: https://bugzilla.redhat.com/show_bug.cgi?id=1210903 and it looks like Eric noticed it and has commented in the bug.
I wonder if the issue isn't obvious to you since you know a bit more about how libvirt interacts with qemu than I do.
Nope. I am ignorant about all the new block operations, but if Eric is on the case you're in good hands.
Also, is there a link so some documentation that describes all of the json going between libvirt and qemu?
If there is, I guess it would be somewhere in qemu documentation, maybe on linux-kvm.org?
participants (3)
-
Laine Stump
-
Matthew Schumacher
-
Shanzhi Yu