On 12/6/19 9:36 AM, Eric Blake wrote:
[adding in Peter Maydell, as there is now potential talk of other
4.2-worthy patches]
On 12/6/19 4:18 AM, Vladimir Sementsov-Ogievskiy wrote:
> 05.12.2019 23:16, John Snow wrote:
>>
>>
>> On 12/5/19 3:09 PM, Eric Blake wrote:
>>> On 12/5/19 1:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Here is double bug:
>>>>
>>>> First, return error but not set errp. This may lead to:
>>>> qmp block-dirty-bitmap-remove may report success when actually failed
>>>>
>>>> block-dirty-bitmap-remove used in a transaction will crash, as
>>>> qmp_transaction will think that it returned success and will cal
>>>
>>> call
>>>
>>>> block_dirty_bitmap_remove_commit which will crash, as state->bitmap
is
>>>> NULL
>>>>
>>>> Second (like in anecdote), this case is not an error at all. As it is
>>>> documented in the comment above bdrv_co_remove_persistent_dirty_bitmap
>>>> definition, absence of bitmap is not an error, and similar case
>>>> handled
>>>> at start of qcow2_co_remove_persistent_dirty_bitmap, it returns 0 when
>>>> there is no bitmaps at all..
>>>
>>> double .
>>>
>>>>
>>>> But when there are some bitmaps, but not the requested one, it return
>>>> error with errp unset.
>>>>
>>>> Fix that.
>>>>
>>>> Fixes: b56a1e31759b750
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy
<vsementsov(a)virtuozzo.com>
>>>> ---
>>>>
>>>> Hi all!
>>>>
>>>> Ohm, suddenly we faced this bug. It's a regression in 4.2. I'm
very
>>>> sorry for introducing it, and it sad that it's found so late..
>>>>
>>>> Personally, I think that this one worth rc5, as it makes new bitmap
>>>> interfaces unusable. But the decision is yours.
>>>>
>>>> Last minute edit: hmm, actually, transaction action introduced in
>>>> 4.2, so crash is not a regression, only broken
>>>> block-dirty-bitmap-remove
>>>> command is a regression... Maybe it's OK for stable.
>>>
>>> Libvirt REALLY wants to use transaction bitmap management (and require
>>> qemu 4.2) for its incremental backup patches that Peter is almost ready
>>> to merge in. I'm trying to ascertain:
>>>
>>> When exactly can this bug hit? Can you give a sequence of QMP commands
>>> that will trigger it for easy reproduction? Are there workarounds
>>> (such
>>> as checking that a bitmap exists prior to attempting to remove it)? If
>>> it does NOT get fixed in rc5, how will libvirt be able to probe whether
>>> the fix is in place?
>>>
>>
>> It looks like:
>>
>> - You need to have at least one bitmap
>> - You need to use transactional remove
>> - you need to misspell the bitmap name
>> - The remove will fail (return -EINVAL) but doesn't set errp
>> - The caller chokes on incomplete information, state->bitmap is NULL
>
>
> No, that would be too simple, the thing is worse. Absolutele correct
> removing is broken, without any misspelling
>
> Bug triggers when we are removing persistent bitmap that is not stored
> yet in the image AND at least one (another) bitmap already stored in
> the image. So, something like:
>
> 1. create persistent bitmap A
> 2. shutdown vm (bitmap A is synced)
> 3. start vm
> 4. create persistent bitmap B
> 5. remove bitmap B - it fails (and crashes if in transaction)
>
> ====
>
> Hmm, workaround..
>
> I'm afraid that the only thing is not remove persistent bitmaps, which
> were never synced to the image. So, instead the sequence above, we need
>
>
> 1. create persistent bitmap A
> 2. shutdown vm
> 3. start vm
> 4. create persistent bitmap B
> 5. remember, that we want to remove bitmap B after vm shutdown
> ...
> some other operations
> ...
> 6. vm shutdown
> 7. start vm in stopped mode, and remove all bitmaps marked for removing
> 8. stop vm
>
> But, I think that in real circumstances, vm shutdown is rare thing...
This is sounding a bit more serious. As I said earlier, it shouldn't
delay 4.2 on its own, but if the fix is obvious (and other than
comments, it is a single change from 'ret = -EINVAL' to 'ret = 0' which
fixes a definite reproducible crash), I think it rises to the level of
acceptable.
I've been so worried about the question of which release, that I don't
know if I've previously offered:
Reviewed-by: Eric Blake <eblake(a)redhat.com>
Oh, that is quite a bit more serious than I thought too.
Yeah, I want this in 4.2 if at all possible.
Reviewed-by: John Snow <jsnow(a)redhat.com>