On 01/07/2014 07:22 AM, Jiri Denemark wrote:
> On Tue, Dec 24, 2013 at 07:22:38 -0700, Eric Blake wrote:
>> On 12/20/2013 02:36 PM, Jiri Denemark wrote:
>>> Generally, every API that is going to begin a job should do that before
>>> fetching data from vm->def. However, qemuDomainGetBlockInfo does not
>>> know whether it will have to start a job or not before checking vm->def.
>>> To avoid using disk alias that might have been freed while we were
>>> waiting for a job, we use its copy. In case the disk was removed in the
>>> meantime, we will fail with "cannot find statistics for device
'...'"
>>> error message.
>>>
>>> Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
>>> ---
>>> src/qemu/qemu_driver.c | 17 ++++++++++++-----
>>> 1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> This fixes the potential for a crash, but isn't this related to the same
>> problem that John has been working on? That is, when one thread is
>> migrating a domain and the other is getting block info, we have a race
>> where qemu will quit before we get a chance to probe for the block info.
>>
https://www.redhat.com/archives/libvir-list/2013-December/msg00984.html
>>
>> Would changing when we grab the job (and grabbing it unconditionally,
>> even if we end up not needing to query) help solve both issues?
>
> I might help with what John was trying to do but I don't think this
> patch alone would solve the issue. May I consider this patch ACKed? :-)
I was debating whether a single patch could solve both issues at once,
to avoid rebase churn; but at this point, given that it solves a CVE,
I'm okay giving ACK to your patch as-is.
I tagged this before break meaning to get back to it and respond, but
other things got in the way... Thanks for the prod...
Not sure this change will affect what's seen in the issue I looked at
before the break. In that case, the code returns success (eg, 0 - zero)
when virDomainObjIsActive() fails and that leaves "info->allocation =
info->physical;" which ends up being unexpected from the callers POV.
That case wanted a way to "detect" (via failure) that although the guest
is not marked as shutoff/down it's not returning the "real" allocation
value because we've hit a point in time during the migration where we
cannot get the answer from QEMU.
John