Re: [libvirt] [PATCH 1/1] qemu: sync blockjob finishing in qemuDomainGetBlockJobInfo

Thursday, 28 November 2019

On 28.11.2019 14:11, Peter Krempa wrote:
...
 On Thu, Nov 28, 2019 at 10:51:56 +0000, Nikolay Shirokovskiy wrote:
>
>
> On 28.11.2019 12:05, Peter Krempa wrote:
>> On Thu, Nov 28, 2019 at 07:29:08 +0000, Nikolay Shirokovskiy wrote:
>>>
>>>
>>> On 27.11.2019 17:56, Peter Krempa wrote:
>>>> On Wed, Nov 27, 2019 at 17:19:18 +0300, Nikolay Shirokovskiy wrote:
>>>>> Due to race qemuDomainGetBlockJobInfo can return there is
>>>>> no block job for disk but later call to spawn new blockjob
>>>>> can fail because libvirt internally still not process blockjob
>>>>> finishing. Thus let's wait for blockjob finishing if we
>>>>> report there is no more blockjob.
>>>>
>>>> Could you please elaborate how this happened? e.g. provide some logs?
>>>>
>>>> Polling with qemuDomainGetBlockJobInfo will always be racy and if the
>>>> problem is that diskPriv->blockjob is allocated but qemu didn't
report
>>>> anything I suspect the problem is somewhere else.
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: Nikolay Shirokovskiy
<nshirokovskiy(a)virtuozzo.com&gt;
>>>>> ---
>>>>>  src/qemu/qemu_driver.c | 18 +++++++++++++++++-
>>>>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>>>>> index 669c12d6ca..b148df3a57 100644
>>>>> --- a/src/qemu/qemu_driver.c
>>>>> +++ b/src/qemu/qemu_driver.c
>>>>> @@ -17785,12 +17785,26 @@ qemuDomainGetBlockJobInfo(virDomainPtr
dom,
>>>>>          goto endjob;
>>>>>      }
>>>>>  
>>>>> +    qemuBlockJobSyncBegin(job);
>>>>> +
>>>>>      qemuDomainObjEnterMonitor(driver, vm);
>>>>>      ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm),
job->name, &rawInfo);
>>>>>      if (qemuDomainObjExitMonitor(driver, vm) < 0)
>>>>>          ret = -1;
>>>>> -    if (ret <= 0)
>>>>> +    if (ret < 0)
>>>>> +        goto endjob;
>>>>> +
>>>>> +    if (ret == 0) {
>>>>
>>>> So if qemu returns that there is no blockjob ...
>>>>
>>>>> +        qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE);
>>>>> +        while (qemuBlockJobIsRunning(job)) {
>>>>
>>>> but we think it's still running it's either that the event
arrived but
>>>> wasn't processed yet. In that case this would help, but the outcome
>>>> would not differ much from the scenario if the code isn't here as
the
>>>> event would be processed later.
>>>>
>>>>
>>>>> +            if (virDomainObjWait(vm) < 0) {
>>>>> +                ret = -1;
>>>>> +                goto endjob;
>>>>> +            }
>>>>> +            qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE);
>>>>
>>>> If there's a bug and qemu doesn't know that there's a job but
libvirt
>>>> thinks there's one but qemu is right, this will lock up here as the
>>>> event will never arrive.
>>>>
>>>>> +        }
>>>>>          goto endjob;
>>>>
>>>> My suggested solution would be to fake the job from the data in
'job'
>>>> rather than attempt to wait in this case e.g.
>>>>
>>>> if (ret == 0) {
>>>>     rawStats.type = job->type;
>>>>     rawStats.ready = 0;
>>>> }
>>>>
>>>> and then make it to qemuBlockJobInfoTranslate which sets some fake
>>>> progress data so that the job looks active.
>>>>
>>>> That way you get a response that there's a job without the potential
to
>>>> deadlock.
>>>>
>>>
>>> Fake progress data does not look nice. May be we should cache progress on
qemuDomainGetBlockJobInfo
>>> calls so it will not go backwards after the patch?
>>
>> I agree but it's always a tradeoff between complexity of the code and
>> benefit of it. If it's a very unlikely scenario, returning fake data is
>> simpler than adding caching and persistence over restarts.
>>
>>> I can understand protecting against possible bugs when recovering is not
possible or difficult
>>> like case of data corruption. But in this case it is matter of libvirtd
restart I guess.
>>
>> Well, that's why I'm asking what the original bug is. Maybe the problem
>> lies somewhere else and there's a more elegant fix.
>>
>
> I have mgmt polling for blockpull job completion. After completion fail/success
> is infered by inspecting domain xml. So even on success due to race we can
> see the backing chaing that should be deleted by blockpull and decide it was
> failure.

 The strongly preferred method to figure out success of a job is to use
 the VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 event which is delivered after the
 block job is finalized by libvirt.

 This means that any backing chain modification/redetection is finished
 after it's delivered and additionally the event reports success state.

 Additionally since libvirt 5.10 and qemu 4.2 blockdev is used which
 impacts block jobs. With blockdev used we persist the jobs in qemu until
 we process the state of them. This means that the above race is
 impossible with current qemu as the block job would still be present in
 the output of query-blockjobs until we process the event from qemu and
 thus modify the backing chain in the XML during one lock.

 In the pre-blockdev era I can see that there's possibility that it could
 happen, but in my opinion it's extremely unlikely. The race window is
 between qemu removing the job and delivering the event. After the event
 is delivered we enqueue it in the job handling and thus it will be
 processed in order. This means that the code would have to
 call virDomainGetBlockJobInfo, where the info from qemu is no longer
 present, but that also means that qemu sent the event notifying about
 the finished job. The window where the virDomainGetXMLDesc call would
 have to take place is between those two actions.

 Given that with newest qemu and libvirt the situation will not happen
 and it's unlikely I'm against adding this complex code you proposed.
 We can go with the fake progress data if you insist on fixing the
 polling scenario which is suboptimal by itself.

Thanx for elaborate answer. In the long term the problem is solved
by block jobs reaped explicitly so I'm ok with dismissing the patch.

Nikolay

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH 1/1] qemu: sync blockjob finishing in qemuDomainGetBlockJobInfo