On Thu, Apr 04, 2019 at 11:48:40AM +0200, Peter Krempa wrote:
On Thu, Apr 04, 2019 at 09:49:16 +0100, Daniel Berrange wrote:
> On Thu, Apr 04, 2019 at 10:01:27AM +0200, Bjoern Walk wrote:
> > This patch series introduces the ability to save additional information
> > for the domain state and exposes this information in virsh domstate.
> >
> > For example in the case of QEMU guest panic events, we can provide additional
> > information like the crash reason or register state of the domain. This
> > information usually gets logged in the domain log but for debugging it is
> > useful to have it accessible from the client. Therefore, let's introduce a
new
> > public API function, virDomainGetStateParams, an extensible version of
> > virDomainGetState, which returns the complete state of the domain, including
> > newly introduced additional information.
> >
> > Let's also extend virsh domstate and introduce a new parameter --info to
show
> > the domain state, reason and additional information when available.
> >
> > virsh # domstate --info guest-1
> > crashed (panicked)
> > s390.core = 0
> > s390.psw-mask = 0x0002000180000000
> > s390.psw-addr = 0x000000000010f146
> > s390.reason = disabled-wait
>
> This info is all just guest panick related data, so I'm not covinced we
> should overload "domstate" for this random set of low level hardware
> parameters.
I'm not even sure whether it's worth having an API to query it at all.
We don't really have means to store the data reliably across libvirtd
restarts as there is no status XML for inactive VMs. This means that the
data will get lost. It also will become instantly invalidated when
starting the VM.
I'm not bothered about loosing it when starting the VM. I tend to view
it as "point in time" data about CPU state.
I think the most immediately useful is to actually include this in an
async event when the crash happens.
It is possible to configure the panic action so that either QEMU is
killed (and optionally) restarted), or for QEMU to simply have its
CPUs stopped.
In the latter case I think it is reasonable to have an API to report
it and this lets us save it in the domain state XML too.
As soon as QEMU is actually stopped though, I think we should no
longer try to report it.
IOW, apps should use the event primarily. If they want to get it via
the API, they must configure QEMU to stop CPUs on panic, instead of
shutting down/starting.
I think this would fit well with my suggestion to consider this API
as a way to report live CPU registers (eg a libvirt API to expose
QEMU's "info registers" data). Obviously that woudl require a QMP
impl first. Just saying this from a conceptual design POV.
Currently we log the data into the domain log file which in my
opinion
feels good enough for this kind of low level information which is not of
much use for non-devs.
We really should add an API via virStream to fetch logs
Additionally the advantage of logging is that bug reporting tools
usually capture the VM log files.
It is definitely good to have it in the logs, but at the same time this
is structured data and so it is good to preseve the structure for apps
if they want todo live analysis without needing human to read logs.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|