Re: [libvirt] [Qemu-devel] QMP: Supporting off tree APIs

7 Jan 2012

      On Fri, 06 Jan 2012 09:08:19 -0600
Anthony Liguori <anthony@codemonkey.ws> wrote:
...
On 01/06/2012 06:45 AM, Luiz Capitulino wrote:
...
On Fri, 6 Jan 2012 11:06:12 +0000
Stefan Hajnoczi<stefanha@gmail.com>  wrote:
...
Proper async support - if you mean the ability to have multiple QMP
commands pending at a time - is harder than just fixing QEMU.  Clients
also need to start taking advantage of it.  Clients that do not will
be unable to continue when a QMP command takes a long time to
complete.
They can be fixed if we offer proper async support. Today they can't.
...
I think avoiding long-running QMP commands is a good idea.  We have
events which can be used to signal completion.  It's easy to implement
and does not require clients to change the way they think about QMP
commands.
I agree in principle, but in practice we risk having different subsystems
and different commands introducing their own async support which is going
to make our API (which is already far from perfect) impossible to use,
not to mention the maintainability hell that will arise from it.
I absolutely agree with you but practically speaking, we don't have generic 
async support today.
It's been my experience that holding up patch series for generic infrastructure 
that does exist 1) causes unnecessary angst in contributors 2) puts pressure on 
the infrastructure to get something in fast vs. get something in that's good.
And honestly, it's (2) that I worry the most about.  I don't want us to rush 
async support because we're eager to get block streaming merged.  This is why 
I'm not holding any new devices back while we get QOM merged even if it creates 
more work for me and introduces new compatibility problems to solve.
I agree.
...
We also need to look at this interface as a public interface whether we 
technically committed it to or not.  The fact is, an important user is relying 
upon so that makes it a supported interface.  Even though I absolutely hate it, 
this is why we haven't changed the help output even after all of these years. 
Not breaking users should be one of our highest priorities.
One thing I don't understand: how is libvirt relying on it if it doesn't
exist in qemu.git yet?
...
Now we could change this command to make it a better QMP interface and we could 
do that in a compatible fashion.
However, since I think we'll get proper async support really soon and that will 
involve fundamentally changing this command (along with a bunch of other 
commands), I don't think there's a lot of value in making cosmetic changes right 
now.  If we're going to break backwards compatibility, I'd rather do it once 
than twice.
It goes beyond cosmetic changes. For example, will we allow other async
block commands to use this interface? And if we're doing this for block,
why not accept something similar for other subsystems if someone happen to
submit it?

Let me take a non-cosmetic change request example. The BLOCK_JOB_COMPLETED
event has a 'error' field. However, it's impossible to know which error
happened because the 'error' field contains only the human error description.

Another problem: the event is called BLOCK_JOB_COMPLETED, but it's tied
to the streaming API. If we allow other commands to use it, they will likely
have to add fields there, making the event worse than it already is.

There's more, because I skipped this review in v3 as I jumped to the
"proper async support" discussion...
...
What I'd suggest is that we take the command in as-is and we mark it:
Since: 1.1
Deprecated: 1.2
See Also: TBD
The idea being that we'll introduce new generic async commands in 1.2 and 
deprecate this command.  We can figure out the removal schedule then too.  Since 
this command hasn't been around all that long, we can probably have a short 
removal schedule.
That makes its inclusion even discussable :) A few (very honest) questions:

 1. Is it really worth it to have the command for one or two releases?

 2. Will we allow other block commands to use this async API?

 3. Are we going to accept other ad-hoc async APIs until we have a
    proper one?
...
We should also mark the other psuedo-async commands this way too FWIW.
Regards,
Anthony Liguori
...
Note that I'm not exactly advocating for heavyweight async support, I just
want to avoid keeping messing with this area.
Maybe, we could go real simple by having a standard event for
asynchronous commands, say ASYNC_CMD_FINISHED or something and that event
would contain only the command id and if the command succeeded or
failed. The APIs for cancelling and querying would have to be provided
by the command itself.
I can start a new thread to discuss async support. I haven't done it yet
because I don't have a concrete proposal and I also suspect that people are
tired of discussing this over and over again.
...
Today I doubt many QMP clients have implemented multiple pending
commands, although the wire protocol allows it.
That's true, but adding the id field in the command dict was silly, as
we don't support multiple pending commands.
...
...
...
With respect to libvirt relying on interfaces before they exist in QEMU, we need
to be a bit flexible here.  We want to get better at co-development to help make
libvirt support QEMU features as the bleeding edge.
Forcing libvirt to wait until a feature is fully baked in QEMU will ensure
there's always a feature gap in libvirt which is in none of our best interests.
We can ask them to wait at least until the API is merged. Most good review
and potential problems will only come when the patches are worked on and
reviewed on the list.
The API was agreed on between QEMU and libvirt developers on the mailing
lists - you were included in that process.  Back in August I sent
patches which you saw ("[0/4] Image Streaming API").
I know the API is not what we'd design today when it comes to the
cosmetics.  We'd want to name things differently, use the Unsupported
event which was introduced in the meantime, and maybe make the job
completion concept generic.
QMP and QAPI have evolved in the time that this feature has been
reimplemented.  I have tried to keep up with QMP but the API itself is
from August.  We can't keep redrawing the lines.
In summary:
  * The API was designed and agreed several months ago.
  * You saw it back then and I've kept you up-to-date along the way.
  * It predates current QAPI conventions.
  * Merging it poses no problem, changing the API breaks existing libvirt.
It does pose problems. The name changes I've proposed are not minor
things, it's about conforming to the protocol which is quite important.
Duplicating errors is something that just doesn't make sense either.
And most importantly, you're adding async support to the block layer. This
means that we'll have two different async APIs when we add one to QMP,
or worse other subsystems will be motivated to have their own async APIs
too.
...
I do feel bad that the code has been out of qemu.git for so long and I
certainly won't attempt this again in the future.  But I really think
the pros and cons say we should accept it as an August 2011 API just
like many of the other HMP/QMP commands we carry.
I disagree. This should be reviwed and changed as any other submission.
...
Regarding being more flexible about working together with libvirt, I
do think it's important to work on APIs together.  This avoids use
developing something purely from the QEMU internal perspective which
turns out to be unconsumable by our biggest QMP user :).
We do work together with them. I've never ignored their opinion and I'm
probably the strongest opinionated when it comes to compatibility.
I just can't see how accepting something that is now rotted is going
to help either of us.