On 10/19/2016 04:50 AM, Michal Privoznik wrote:
On 17.10.2016 20:46, Nikolay Shirokovskiy wrote:
> Hi, all.
>
> We would like to use virDomainQemuMonitorCommand to query qemu independently of
> libvirt state. Currenly it is not possible. This API call takes job condition
> just like any other call and thus is unavailable on any lengthy(or stucked)
> synchronous job.
>
> I've already posted this question in list, just failed to find the reference.
> Somebody suggested to use proxy (and even an implementation) in between qemu
> and libvirt that can inject commands to qemu and filter replies. It is not
> really convinient. This way test setups will be different from production and
> we can not investigate problems in production environment.
>
> I'd like to drop acquiring job condition in the call as this function does not
> deal with libvirt state (except for the taint but is is ok, we will not mess
> things up here). But this is not enough, we need to make qemu monitor deal with
> many qemu commands simultaneously. Looks like it is quite a big change for
> test/debug case. But I guess eventually normal user cases can get benefits too
> from this monitor changes. For example all query API calls that query qemu
> directly can be changed to not to wait for some synchronous job
> finishing.(qemuDomainGetBlockJobInfo for example).
IIRC the last time I looked into this the problem was not on libvirt
side. QEMU's monitor was unable to process multiple commands at once.
But maybe that's no longer the case, I don't know.
However, what I think we should do is to turn our jobs into sort of RW
locks. That is - we could allow multiple QUERY jobs to happen
simultaneously and leave MODIFY jobs to be exclusive. I think dropping
BeginJob() from an API is a no go as it will definitely bite us in the
future.
Unfortunately, I have no idea what my suggestion would look like in
terms of the code. How difficult it would be to implement it (and
whether monitor code is prepared for that).
Michal
The problem may be is that the lock is taken in a lot of places, f.e.
when the
command to perform fsfreeze is sent to the guest agent (monitor is not
touched) and the guest deadlocks inside VSS. In this case I have to use
gdb to diagnose entire chain including serial ports state.
This is the problem which should be addressed. If monitor is unable to
answer. OK. The hang inside QEMU and should be fixed. But right now
the bottleneck is inside libvirt and it is really painful.
Den