[libvirt] Trying to debug "Received unexpected event 3" from libvirt

Hi, I'm trying to debug this issue, which may be affecting my inability to perform live snapshot. 1. I'm not sure what 'Waking up a tragedian" in the debug log means - what exactly is a tragedian? 2. In any case, it'd be great if the WARN would mention mon->await_event - is it the event libvirt is actually waiting for? (Both from qemu/qemu_agent.c) 3. I reckon event 3 is QEMU_AGENT_EVENT_RESET ? (from qemu/qemu_agent.h) 4. I'm also getting 'End of file while reading data: Input/output error' messages, not sure what they mean yet. (using 1.2.18.2-1 on FC23, trying to live-snapshot VMs (with Centos 6 & 7 in them, all with qemu guest agent, AFAIK). TIA, Y.

On 30.12.2015 11:30, Yaniv Kaul wrote:
Hi,
Hey, sorry for getting to you so late.
I'm trying to debug this issue, which may be affecting my inability to perform live snapshot. 1. I'm not sure what 'Waking up a tragedian" in the debug log means - what exactly is a tragedian?
It's the thread that has issued the change state API (shutdown, reboot, ..) and it's waiting for confirmation on the monitor. For instance, mgmt app issues virDomainPMSuspendForDuration() which in qemu driver is implemented via some agent calls. So the flow is like this: 1) the control gets to qemuDomainPMSuspendForDuration() 2) libvirt does some checks and issues 'guest-suspend-disk' command (or corresponding command to the selected target). 3) qemu-ga running inside guest tries (!) to suspend the guest (it may not necessarily succeed) 4) meanwhile, as guest is writing something onto disk (saving its RAM - but one is unable to tell from outside), the libvirt API is blocked 5) finally, guest kernel calls 'HALT' to which qemu responds by sending libvirt 'RESET' event 6) the libvirt event loop finds out that an event occurred on the domain monitor and calls callback 7) the callback will wake up the sleeping API if the event the API is waiting for matches the one obtained on the monitor
2. In any case, it'd be great if the WARN would mention mon->await_event - is it the event libvirt is actually waiting for?
Sure, that would be much more helpful - mind posting a patch?
(Both from qemu/qemu_agent.c) 3. I reckon event 3 is QEMU_AGENT_EVENT_RESET ? (from qemu/qemu_agent.h)
Correct.
4. I'm also getting 'End of file while reading data: Input/output error' messages, not sure what they mean yet.
Usually they mean crashing daemon. If you are able to get a stacktrace please do share it somewhere. Michal

On Fri, Jan 8, 2016 at 7:00 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 30.12.2015 11:30, Yaniv Kaul wrote:
Hi,
Hey,
sorry for getting to you so late.
I'm trying to debug this issue, which may be affecting my inability to perform live snapshot. 1. I'm not sure what 'Waking up a tragedian" in the debug log means - what exactly is a tragedian?
It's the thread that has issued the change state API (shutdown, reboot, ..) and it's waiting for confirmation on the monitor. For instance, mgmt app issues virDomainPMSuspendForDuration() which in qemu driver is implemented via some agent calls. So the flow is like this:
1) the control gets to qemuDomainPMSuspendForDuration() 2) libvirt does some checks and issues 'guest-suspend-disk' command (or corresponding command to the selected target). 3) qemu-ga running inside guest tries (!) to suspend the guest (it may not necessarily succeed) 4) meanwhile, as guest is writing something onto disk (saving its RAM - but one is unable to tell from outside), the libvirt API is blocked 5) finally, guest kernel calls 'HALT' to which qemu responds by sending libvirt 'RESET' event 6) the libvirt event loop finds out that an event occurred on the domain monitor and calls callback 7) the callback will wake up the sleeping API if the event the API is waiting for matches the one obtained on the monitor
is it the event libvirt is actually waiting for?
2. In any case, it'd be great if the WARN would mention mon->await_event
Sure, that would be much more helpful - mind posting a patch?
Attached and tested. I can't post it properly, as the git is a bit 'dirty' with some po files that I can't clean: On branch workbranch Your branch is up-to-date with 'origin/master'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory) modified: po/ar.po modified: po/as.po modified: po/bg.po modified: po/bn.po modified: po/bn_IN.po modified: po/bs.po modified: po/ca.po modified: po/cs.po modified: po/cy.po modified: po/da.po modified: po/de.po modified: po/el.po modified: po/en_GB.po modified: po/es.po modified: po/et.po modified: po/fi.po modified: po/fr.po modified: po/gl.po modified: po/gu.po modified: po/he.po modified: po/hi.po ... Thanks, Y.
(Both from qemu/qemu_agent.c) 3. I reckon event 3 is QEMU_AGENT_EVENT_RESET ? (from qemu/qemu_agent.h)
Correct.
4. I'm also getting 'End of file while reading data: Input/output error' messages, not sure what they mean yet.
Usually they mean crashing daemon. If you are able to get a stacktrace please do share it somewhere.
Michal

On 14.01.2016 19:51, Yaniv Kaul wrote:
On Fri, Jan 8, 2016 at 7:00 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 30.12.2015 11:30, Yaniv Kaul wrote:
Hi,
Hey,
sorry for getting to you so late.
I'm trying to debug this issue, which may be affecting my inability to perform live snapshot. 1. I'm not sure what 'Waking up a tragedian" in the debug log means - what exactly is a tragedian?
It's the thread that has issued the change state API (shutdown, reboot, ..) and it's waiting for confirmation on the monitor. For instance, mgmt app issues virDomainPMSuspendForDuration() which in qemu driver is implemented via some agent calls. So the flow is like this:
1) the control gets to qemuDomainPMSuspendForDuration() 2) libvirt does some checks and issues 'guest-suspend-disk' command (or corresponding command to the selected target). 3) qemu-ga running inside guest tries (!) to suspend the guest (it may not necessarily succeed) 4) meanwhile, as guest is writing something onto disk (saving its RAM - but one is unable to tell from outside), the libvirt API is blocked 5) finally, guest kernel calls 'HALT' to which qemu responds by sending libvirt 'RESET' event 6) the libvirt event loop finds out that an event occurred on the domain monitor and calls callback 7) the callback will wake up the sleeping API if the event the API is waiting for matches the one obtained on the monitor
is it the event libvirt is actually waiting for?
2. In any case, it'd be great if the WARN would mention mon->await_event
Sure, that would be much more helpful - mind posting a patch?
Attached and tested. I can't post it properly, as the git is a bit 'dirty' with some po files that I can't clean:
On branch workbranch Your branch is up-to-date with 'origin/master'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working directory)
modified: po/ar.po modified: po/as.po modified: po/bg.po modified: po/bn.po modified: po/bn_IN.po modified: po/bs.po modified: po/ca.po modified: po/cs.po modified: po/cy.po modified: po/da.po modified: po/de.po modified: po/el.po modified: po/en_GB.po modified: po/es.po modified: po/et.po modified: po/fi.po modified: po/fr.po modified: po/gl.po modified: po/gu.po modified: po/he.po modified: po/hi.po
I guess you were doing 'make dist' or 'make rpm'. Both of them result in generating of translation strings. Anyway, you can just drop them and have clean working tree: git checkout po/
...
Thanks, Y.
(Both from qemu/qemu_agent.c) 3. I reckon event 3 is QEMU_AGENT_EVENT_RESET ? (from qemu/qemu_agent.h)
Correct.
4. I'm also getting 'End of file while reading data: Input/output error' messages, not sure what they mean yet.
Usually they mean crashing daemon. If you are able to get a stacktrace please do share it somewhere.
Michal
qemu_agent.c.diff
diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index f979f82..924c177 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -1241,7 +1241,7 @@ void qemuAgentNotifyEvent(qemuAgentPtr mon, } } else { /* shouldn't happen but one never knows */ - VIR_WARN("Received unexpected event %d", event); + VIR_WARN("Received unexpected event %d (expected %d)", event, mon->await_event); } }
ACKed and pushed. I've committed the patch under your name. I hope that's okay with you. Michal

On Fri, Jan 15, 2016 at 9:37 AM, Michal Privoznik <mprivozn@redhat.com> wrote:
On Fri, Jan 8, 2016 at 7:00 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 30.12.2015 11:30, Yaniv Kaul wrote:
Hi,
Hey,
sorry for getting to you so late.
I'm trying to debug this issue, which may be affecting my inability to perform live snapshot. 1. I'm not sure what 'Waking up a tragedian" in the debug log means - what exactly is a tragedian?
It's the thread that has issued the change state API (shutdown, reboot, ..) and it's waiting for confirmation on the monitor. For instance, mgmt app issues virDomainPMSuspendForDuration() which in qemu driver is implemented via some agent calls. So the flow is like this:
1) the control gets to qemuDomainPMSuspendForDuration() 2) libvirt does some checks and issues 'guest-suspend-disk' command (or corresponding command to the selected target). 3) qemu-ga running inside guest tries (!) to suspend the guest (it may not necessarily succeed) 4) meanwhile, as guest is writing something onto disk (saving its RAM
but one is unable to tell from outside), the libvirt API is blocked 5) finally, guest kernel calls 'HALT' to which qemu responds by sending libvirt 'RESET' event 6) the libvirt event loop finds out that an event occurred on the domain monitor and calls callback 7) the callback will wake up the sleeping API if the event the API is waiting for matches the one obtained on the monitor
is it the event libvirt is actually waiting for?
2. In any case, it'd be great if the WARN would mention mon->await_event
Sure, that would be much more helpful - mind posting a patch?
Attached and tested. I can't post it properly, as the git is a bit 'dirty' with some po files that I can't clean:
On branch workbranch Your branch is up-to-date with 'origin/master'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working
On 14.01.2016 19:51, Yaniv Kaul wrote: - directory)
modified: po/ar.po modified: po/as.po modified: po/bg.po modified: po/bn.po modified: po/bn_IN.po modified: po/bs.po modified: po/ca.po modified: po/cs.po modified: po/cy.po modified: po/da.po modified: po/de.po modified: po/el.po modified: po/en_GB.po modified: po/es.po modified: po/et.po modified: po/fi.po modified: po/fr.po modified: po/gl.po modified: po/gu.po modified: po/he.po modified: po/hi.po
I guess you were doing 'make dist' or 'make rpm'. Both of them result in generating of translation strings. Anyway, you can just drop them and have clean working tree: git checkout po/
Indeed, I've used 'make rpm', as I could not get 'sudo make install' correctly install libvirt on my Fedora - failed to get the right directories set in the './configure' script. I wish it would have auto-detected Fedora and used the right defaults. After failing to do so normally, I went and look again into libvirt.org - where I failed to find 'how to contribute' document. Only today I've found http://libvirt.org/compiling.html - which has a hint to use ''--system' to autogen.sh - perhaps it should be on by default? That's part of a bigger problem - no 'how to contribute' section that easily found on the site. Using search, I've found first hit to an email archived[1] which pointed me to [2] - which is the right document, probably in the wrong place (under internals?). And that one didn't have the hint above... [1] http://www.redhat.com/archives/libvir-list/2014-April/msg00004.html [2] http://libvirt.org/hacking.html
...
Thanks, Y.
(Both from qemu/qemu_agent.c) 3. I reckon event 3 is QEMU_AGENT_EVENT_RESET ? (from
qemu/qemu_agent.h)
Correct.
4. I'm also getting 'End of file while reading data: Input/output error' messages, not sure what they mean yet.
Usually they mean crashing daemon. If you are able to get a stacktrace please do share it somewhere.
Michal
qemu_agent.c.diff
diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index f979f82..924c177 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -1241,7 +1241,7 @@ void qemuAgentNotifyEvent(qemuAgentPtr mon, } } else { /* shouldn't happen but one never knows */ - VIR_WARN("Received unexpected event %d", event); + VIR_WARN("Received unexpected event %d (expected %d)", event, mon->await_event); } }
ACKed and pushed. I've committed the patch under your name. I hope that's okay with you.
Yes, much appreciated - tried to send the patch but perhaps it doesn't work easily via GMail. Y.
Michal

On 16.01.2016 18:02, Yaniv Kaul wrote:
On Fri, Jan 15, 2016 at 9:37 AM, Michal Privoznik <mprivozn@redhat.com> wrote:
On Fri, Jan 8, 2016 at 7:00 PM, Michal Privoznik <mprivozn@redhat.com> wrote:
On 30.12.2015 11:30, Yaniv Kaul wrote: <snip/> Attached and tested. I can't post it properly, as the git is a bit 'dirty' with some po files that I can't clean:
On branch workbranch Your branch is up-to-date with 'origin/master'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git checkout -- <file>..." to discard changes in working
On 14.01.2016 19:51, Yaniv Kaul wrote: directory)
modified: po/ar.po modified: po/as.po modified: po/bg.po modified: po/bn.po modified: po/bn_IN.po modified: po/bs.po modified: po/ca.po modified: po/cs.po modified: po/cy.po modified: po/da.po modified: po/de.po modified: po/el.po modified: po/en_GB.po modified: po/es.po modified: po/et.po modified: po/fi.po modified: po/fr.po modified: po/gl.po modified: po/gu.po modified: po/he.po modified: po/hi.po
I guess you were doing 'make dist' or 'make rpm'. Both of them result in generating of translation strings. Anyway, you can just drop them and have clean working tree: git checkout po/
Indeed, I've used 'make rpm', as I could not get 'sudo make install' correctly install libvirt on my Fedora - failed to get the right directories set in the './configure' script. I wish it would have auto-detected Fedora and used the right defaults.
I don't think this is desirable. Not everybody wants to have hand built software overwriting files belonging to an installed package. Therefore the default is to install under /usr/local. I think it's a common practice.
After failing to do so normally, I went and look again into libvirt.org - where I failed to find 'how to contribute' document.
Ah, sorry to hear that. But on the other hand, yes - our documentation could use some tuning.
Only today I've found http://libvirt.org/compiling.html - which has a hint to use ''--system' to autogen.sh - perhaps it should be on by default?
That's part of a bigger problem - no 'how to contribute' section that easily found on the site. Using search, I've found first hit to an email archived[1] which pointed me to [2] - which is the right document, probably in the wrong place (under internals?). And that one didn't have the hint above...
If you want, you can propose a patch on that too. Our site is generated from the repo. So if you build docs: libvirt.git $ make -C docs you will find index.hml there, which is built from index.html.in. Same applies for compiling.html and every other *.html file there. So proposing a docs improvement should be easy. I'm glad to help.
[1] http://www.redhat.com/archives/libvir-list/2014-April/msg00004.html [2] http://libvirt.org/hacking.html
<snip/>
ACKed and pushed. I've committed the patch under your name. I hope that's okay with you.
Yes, much appreciated - tried to send the patch but perhaps it doesn't work easily via GMail.
The best way to send patches is to not use MTA at all rather than git send-email: libvirt.git $ git format-patch -4 libvirt.git $ git send-email 000* or shortened: libvirt.git $ git send-email -4 Michal
participants (2)
-
Michal Privoznik
-
Yaniv Kaul