On Fri, 6 Jan 2012 11:06:12 +0000
Stefan Hajnoczi <stefanha(a)gmail.com> wrote:
On Thu, Jan 5, 2012 at 8:26 PM, Luiz Capitulino
<lcapitulino(a)redhat.com> wrote:
> On Thu, 05 Jan 2012 09:56:44 -0600
> Anthony Liguori <aliguori(a)us.ibm.com> wrote:
>
>> On 01/05/2012 09:35 AM, Eric Blake wrote:
>> > On 01/05/2012 07:16 AM, Luiz Capitulino wrote:
>> >>> I know. We're stuck in a hard place here again because
NotSupported
>> >>> has been in the Image Streaming API spec and hence implemented in
>> >>> libvirt for a while now. If we change this then an old client
which
>> >>> only understands NotSupported will not know what to do with the
>> >>> Unsupported error.
>> >>>
>> >>> (Unsupported was not in QEMU when the Image Streaming API was
defined.)
>> >>
>> >> Let me try to understand it: is libvirt relying on an off tree API and
>> >> we are now required to have stable guarantees to off tree APIs?
>> >
>> > No. Libvirt recognizes the off-tree spelling, but does not rely on it -
>> > after all, the goal of libvirt is to provide the high level action,
>> > using whatever underlying mechanism(s) necessary to get to that action,
>> > even if it means using several different attempts until one actually
works.
>> >
>> > If a user has the older libvirt, which only expects the off-tree
>> > spelling, then that user's setup will break if they upgrade qemu but
not
>> > libvirt. But that's not a severe problem - they could have only been
>> > relying on the situation if they were using an off-tree build in the
>> > first place, so they should be aware that upgrading qemu is a
>> > potentially risky scenario, and that they may have to deal with the
pieces.
>>
>> Right, this is the difference between ABI compatibility and strict backwards
>> compatibility.
>>
>> To achieve ABI compatibility, we need to not overload BLOCK_JOB_COMPLETED to
>> mean something other than libvirt what expects it to mean.
>>
>> We MUST provide ABI compatibility and SHOULD provide backwards compatibility
>> whenever possible.
>>
>> In this case, I'd suggest that in the very least, we should add
>> BLOCK_JOB_COMPLETED to qapi-schema.json with gen=False set. That way it's
>> codified in the schema to ensure we maintain ABI compatibility.
>>
>> That said, I'm inclined to say that we should just use the
BLOCK_JOB_COMPLETED
>> name because I don't think we gain a lot by using QMP_JOB_COMPLETED (not
that we
>> shouldn't introduce it, but using it here isn't going to make or break
anything).
>
> What I'm proposing is not just a rename, but adding proper async support to
QMP,
> instead of adding something that is specific to the block layer.
Proper async support - if you mean the ability to have multiple QMP
commands pending at a time - is harder than just fixing QEMU. Clients
also need to start taking advantage of it. Clients that do not will
be unable to continue when a QMP command takes a long time to
complete.
They can be fixed if we offer proper async support. Today they can't.
I think avoiding long-running QMP commands is a good idea. We have
events which can be used to signal completion. It's easy to implement
and does not require clients to change the way they think about QMP
commands.
I agree in principle, but in practice we risk having different subsystems
and different commands introducing their own async support which is going
to make our API (which is already far from perfect) impossible to use,
not to mention the maintainability hell that will arise from it.
Note that I'm not exactly advocating for heavyweight async support, I just
want to avoid keeping messing with this area.
Maybe, we could go real simple by having a standard event for
asynchronous commands, say ASYNC_CMD_FINISHED or something and that event
would contain only the command id and if the command succeeded or
failed. The APIs for cancelling and querying would have to be provided
by the command itself.
I can start a new thread to discuss async support. I haven't done it yet
because I don't have a concrete proposal and I also suspect that people are
tired of discussing this over and over again.
Today I doubt many QMP clients have implemented multiple pending
commands, although the wire protocol allows it.
That's true, but adding the id field in the command dict was silly, as
we don't support multiple pending commands.
>> With respect to libvirt relying on interfaces before they
exist in QEMU, we need
>> to be a bit flexible here. We want to get better at co-development to help
make
>> libvirt support QEMU features as the bleeding edge.
>>
>> Forcing libvirt to wait until a feature is fully baked in QEMU will ensure
>> there's always a feature gap in libvirt which is in none of our best
interests.
>
> We can ask them to wait at least until the API is merged. Most good review
> and potential problems will only come when the patches are worked on and
> reviewed on the list.
The API was agreed on between QEMU and libvirt developers on the mailing
lists - you were included in that process. Back in August I sent
patches which you saw ("[0/4] Image Streaming API").
I know the API is not what we'd design today when it comes to the
cosmetics. We'd want to name things differently, use the Unsupported
event which was introduced in the meantime, and maybe make the job
completion concept generic.
QMP and QAPI have evolved in the time that this feature has been
reimplemented. I have tried to keep up with QMP but the API itself is
from August. We can't keep redrawing the lines.
In summary:
* The API was designed and agreed several months ago.
* You saw it back then and I've kept you up-to-date along the way.
* It predates current QAPI conventions.
* Merging it poses no problem, changing the API breaks existing libvirt.
It does pose problems. The name changes I've proposed are not minor
things, it's about conforming to the protocol which is quite important.
Duplicating errors is something that just doesn't make sense either.
And most importantly, you're adding async support to the block layer. This
means that we'll have two different async APIs when we add one to QMP,
or worse other subsystems will be motivated to have their own async APIs
too.
I do feel bad that the code has been out of qemu.git for so long and
I
certainly won't attempt this again in the future. But I really think
the pros and cons say we should accept it as an August 2011 API just
like many of the other HMP/QMP commands we carry.
I disagree. This should be reviwed and changed as any other submission.
Regarding being more flexible about working together with libvirt, I
do think it's important to work on APIs together. This avoids use
developing something purely from the QEMU internal perspective which
turns out to be unconsumable by our biggest QMP user :).
We do work together with them. I've never ignored their opinion and I'm
probably the strongest opinionated when it comes to compatibility.
I just can't see how accepting something that is now rotted is going
to help either of us.