On 30.09.2016 00:11, John Ferlan wrote:
On 09/27/2016 05:06 AM, Nikolay Shirokovskiy wrote:
>
>
> On 26.09.2016 23:07, John Ferlan wrote:
>>
>>
>> On 09/07/2016 05:24 AM, Nikolay Shirokovskiy wrote:
>>> Special error code helps gracefully handle race conditions on
>>> blockjob cancelling. Consider for example pull backup. We want
>>> to stop it and in the process we cancel blockjob for some of the
>>> exported disks. But at the time this command reaches qemu blockjob
>>> fail for some reason and cancel result and result of stopping
>>> will be an error with quite unexpecte message - no such job (sort of).
>>> Instead we can detect this race and treat as correct cancelling
>>> and wait until fail event will arrive. Then we can report corect
>>> error message with some details from qemu probably.
>>>
>>> Users of domainBlockJobAbort can be affected as -2 is new possible
>>> return code. Not sure if I should stick to the original behaviour in
>>> this case.
>>> ---
>>> src/qemu/qemu_monitor.c | 5 +++++
>>> src/qemu/qemu_monitor_json.c | 11 +++++++----
>>> 2 files changed, 12 insertions(+), 4 deletions(-)
>>>
>>
>> I haven't looked forward to patch 3/3 yet, but perhaps it'd be better in
>> the "caller" that cares to :
>>
>> if (qemu*() < 0) {
>> virErrorPtr err = virGetLastError();
>> if (err && err->code == VIR_ERR_OPERATION_INVALID) {
>> /* Do something specific */
>> }
>> }
>>
>> Returning -2 alters virDomainBlockJobAbort expected return values and
>> well that's probably not a good idea since we don't know if some caller
>> has said (if virDomainBlockJobAbort() == -1) instead of <= -1
>>
>
> It is common place in libvirt to have special error codes instead of 0/-1
> only (qemuMonitorJSONCheckCPUx86) and it feels bulky to have checks that
> involve virGetLastError. It looks like other places that use similar
> construction do not have other choice because of rpc call or smth.
True - my concern was more that virDomainBlockJobAbort now has a new
possible error value of -2 that's not described. Still just adding to
the documentation that it can now return -1 or -2 probably isn't the
best idea especially if what you're trying to do is resolve/handle an
issue for a specific case within qemu code.
Other API's within the bowels of qemu (monitor, monitor_json, etc)
returning -2 usually is somehow handled by the caller, but before
returning out into what would be API land would return what the API has
documented ({0, -1}, {true, false}, {NULL, address}, {count, 0, -1}, etc.)
Changing an external API return value is a risky proposition - works for
some instances fails for others... I've seen code that does this:
if (public_api(arg1, arg2) == -1) {
print some error message
return failure
}
even though it's been strongly suggested to do something like
if (public_api(arg1, arg2) < 0) {
get specific error (usually via errno) and perhaps
make a decision based on that... but fall through to
print some error message
return failure
}
>
> I think when function spawns error when it comes to error code
> value is does not take much care as they are not really semantic
> (at least such common error as invalid operation)
> so one day the caller can catch 'operation invalid' for different
> reason from different place on the stack.
>
> As to virDomainBlockJobAbort it is easy to patch)
>
> I think I should also remove reporting error from qemuMonitorJSONBlockJobError
> in case of DeviceNotActive however it is used from several places even
> like setting speed. It is legacy of commit b976165c when
> a lot of qemu commands share common error checking code. I think I'd
> better copy this code to qemuMonitorJSONBlockJobCancel and change it there
> instead of common place. What do you think?
>
I think you need to think about all the callers of
qemuMonitorJSONBlockJobError... I almost hate to use the word fragile
to describe things, but there is a lot of complexity involved with
respect to all the consumers. I just recall hearing about someone doing
a lot of muttering while implementing the code. The snapshot code in
particular (which you seem to have copied to a degree) has been the
source of some fairly "exotic" bug reports.
While snapshot and migration code was source of inspiration for
me I tried to keep understanding of how it applied to backups ))
Particularly in the main patch I rewrote functions that cancel
multiple blockjob operation a bit.
I personally don't have a lot of insight into the totality of the
blockdev* code. I'd need to spend some time to really understand things
better and provide a more clear/concise answer...
Ok, so after all I'll change this patch to keep virDomainBlockJobAbort API
and will not touch qemuMonitorJSONBlockJobError (just spawn a copy for
qemuMonitorJSONBlockJobCancel).
Nikolay